How Diarization Aids in Video Concept Detection

Gerald Friedland

ICSI

Tuesday, February 14, 2012
12:30pm

"Concepts without percepts are empty; percepts without concepts are blind" -- Immanuel Kant, 1724-1804

Given the exponential growth of videos published on the Internet, mechanisms for clustering, searching, and browsing large numbers of videos have become a major research area. More importantly, there is a demand for event detectors that go beyond the simple finding of objects but rather detect more abstract concepts, such as "feeding an animal" or a "wedding ceremony." ICSI has been fortunate to become part of IARPA's Aladdin program. The program aims at describing the content of a "found" (i.e., consumer-produced) video based on a set of example videos. The computer learns concepts from example videos and then recounts the concepts seen in the query videos. The task is performed on a large set (100k) of "found" Internet videos. ICSI is working in a team together with SRI/Sarnoff, CMU, UMass, UCF, and Cycorp [1]. ICSI's contribution are both acoustic, as well as visual, and multimodal. This talk summarizes last year's acoustic effort.

[1] H. Cheng, A. Tamrakar, S. Ali, Q. Yu, O. Javed, J. Liu, A. Divakaran, H. S. Sawhney, A. Hauptmann, M. Shah, S. Bhattacharya, M. Witbrock, J. Curtis, G. Friedland, R. Mertens, T. Darrell, R. Manmatha, J. Allan: "Team SRI-Sarnoff’s AURORA System @ TRECVID 2011", Proceedings of TrecVid 2011, Gaithersburg, MD, December 2011.