Featured Research: Spoken Language Processing

ICSI's Speech Group has grown significantly since its early days as the Realization Group. The group continues to stay true to its roots as an innovator in the field of automatic speech recognition (ASR) with a unique focus on fostering worldwide collaborations. At the same time, the research scope of the group is expanding and now includes speaker recognition, speech processing of multiparty meetings, dialog systems and information distillation, and porting of ASR systems to work in multiple languages. A recent project of the Speech Group also involved making speech technology accessible and useful to illiterate people in third world villages (see September 2006 Gazette).

As the group nears the end of two decades of speech research, the success of the collaborative aspect of speech research at ICSI is striking. Hervé Bourlard, one of the first visitors to the speech group, now heads IDIAP, a close collaborator with ICSI on speech processing for meetings, and also serves on ICSI's Board of Trustees. He is just one Speech alum who continues to contribute - alumni from the group have a way of finding themselves back at ICSI for research visits. Current staff scientists Nikki Mirghafori, Andreas Stolcke, and Chuck Wooters are all group alumni. Dan Ellis, one of the featured alumni in this issue, continues to work with ICSI on meeting research, although he is an associate professor at Columbia University in New York. He is one of many group alumni who have gone on to create collaborations between ICSI and their new university or lab. Our other featured alum, Eric Fosler-Lussier, continues to develop speech processing techniques he first worked on while at ICSI. In addition, thriving visitor programs continue to provide new insights and perspectives of young, up-and-coming scientists who visit the Speech Group from overseas.

Our featured project for this issue, the Global Autonomous Language Exploitation (GALE) project embodies the spirit of collaboration at ICSI as well as the quality of research, and is challenging the speech staff to pursue exciting new directions in spoken language processing.

GALE's ambitious goals require significant improvement of speech recognition, diarization, sentence segmentation, machine translation, and information distillation in English, Mandarin Chinese, and Arabic.

ICSI's participation in GALE is part of a large collaborative effort called Nightingale led by SRI and including researchers from 15 sites, including IDIAP, University of Washington, and Columbia University. The SRI team is one of three working on the GALE effort; the other two are led by BBN and IBM. We focus here on ICSI's contributions to Nightingale.

Automatic Speech Recognition (ASR)

ICSI researchers applied their proven ASR methods used for American English to Mandarin and Arabic with very promising results. These methods significantly improved word error rate (WER) for both Mandarin and Arabic in initial tests performed at University of Washington and SRI. Encouraged by the early positive results, scientists began to work on incorporating new acoustic features, as well as modifications to the ASR front end, which are already providing improvements in WER. The new approach may be adapted for use with SmartWeb, another collaborative speech project involving ICSI scientists. (More information on SmartWeb is available at www.smartwebproject.de).


When analyzing the output of ASR systems, one challenge is to determine who is speaking and when a new speaker enters the conversation. ICSI researchers have developed highly successful methods for automatically parsing the speech signal based on who speaks when - this is called diarization. The ICSI diarization system was originally developed as part of the ongoing effort in processing speech from meetings, which are all in English. For the GALE project, researchers processed 696 hours of new data in English, Mandarin, and Arabic.

Sentence Segmentation

Sentence segmentation refers to the automatic segmentation of a stream of words into sentences. This makes the output of an ASR system much more comprehensible to a human, but also is crucial for other automatic processing tasks of the speech signal, such as machine translation and information extraction.

Information Distillation

Information distillation is the extraction of the most useful pieces of information related to a given query from massive multilingual audio and text documents. As an example, if someone needed information about an individual, and had some audio and text files to search through, a system that performs information distillation could be very useful. Instead of manually scanning though text documents and listening to the audio files, which is extremely time consuming for people, the distillation system could find relevant information from the files in seconds. Distillation work at ICSI, led by Dilek Hakkani-Tur, began in May and produced positive results as early as June in the first round of NIST evaluations.

Other Speech Processing Research at ICSI

In addition to the GALE effort, the Speech Group continues to work on ASR for meetings, dialog systems, and speaker recognition, as well as the Tamil speech recognizer, a project bringing access to information technology (via a voice controlled user interface) to remote villages in India. Most of the visiting scientists to the Speech Group participate through two international collaborative projects focused on meetings, Interactive Multimodal Information Management (IM2) and Augmented Multi-party Interaction Distance Access (AMIDA, the successor to AMI).

To learn more about current research by the Speech Group, browse the group's publications at www.icsi.berkeley.edu/cgi-bin/pubs/index.pl (select "Speech" from the drop down menu) or read the Speech section of the 2006 Annual Report, which can be downloaded in .pdf format (select "ICSI" from the drop down menu on the publications page) The Annual Report contains detailed technical information on all areas of research at ICSI from the past year.