AI Projects

Identifying semantic components

Identifying potential WMD-related threats before they materialize requires the ability to discover and analyze low-observable WMD-related information from data of all types, including social media. To help build the robust natural language understanding (NLU) systems needed for this goal, this project investigates the automatic identification of semantic components, sub-lexical elements of linguistic meaning that may be composed in different ways to capture the meanings of words.

Multilingual FrameNet: Merging FrameNets for Cross-linguistic Research

One of the greatest challenges to NLP is the increasing variety of languages on the internet; part of the answer to this challenge can come from the FrameNet lexical database, which has been developed for English since 1997 at the International Computer Science Institute (ICSI) based on the principles of Frame Semantics (Fillmore 1977; Fillmore 1985). The lexicon is organized by semantic frames, with valence information derived from attested, manually annotated corpus examples (Fillmore & Baker 2010).


The FrameNet project is building a semantically-rich lexicon of English and a corresponding set of annotated texts, based on more than 600 semantic frames and 130,000 sentences. Comparable FrameNet projects are underway for Spanish, German, and other languages. By providing a layered semantic representation of text, FrameNet delivers a key component of next-generation question answering, machine translation, and other natural language processing applications. Learn more on the FrameNet Web site.


The NTL (Neural Theory of Language) project of the AI Group works in collaboration with other units on the UC Berkeley campus and elsewhere. It combines basic research in several disciplines with applications to natural language understanding systems. Basic efforts include studies in the computational, linguistic, neurobiological, and cognitive bases for language and thought. This research continues to yield a variety of theoretical and practical findings.

Previous Work: Extracting Event Attributes from Unstructured Textual Data for Persistent Situational Awareness

In this collaborative project with Decisive Analytics Corporation (DAC), FrameNet researchers are developing semantic frames for representing the attributes of complex events, which permit more fine-grained analysis than other event recognition frameworks. The researchers are developing event recognition methods focused on organizations and how they plan and carry out actions. These methods are broadly applicable to actions planned and carried out by all types of organizations, such as corporations, government agencies, military units, and insurgent groups. 

Previous Work: Preserving Unwritten Languages

In this project, researchers at ICSI are collaborating with Notre Dame to preserve unwritten languages in danger of disappearing. They are recording speech in a variety of genres and styles using mobile technologies. To enable productive linguistic and language-technology research in the future, they are adding respeaking, in which native speakers listen and repeat each phrase slowly and carefully, as well as oral translation, in which bilingual speakers of the language translate the recordings phrase by phrase into a widely used language such as English.

Previous Work: MetaNet: A Multilingual Metaphor Repository

Researchers from ICSI, UC San Diego, University of Southern California, and UC Merced are building a system capable of understanding metaphors used in American English, Iranian Persian, Russian as spoken in Russia, and Mexican Spanish. The team includes computer scientists, linguists, psychologists, and cognitive scientists.

Previous Work: California Connects

California Connects is a state-level program administered by the Foundation for California Community Colleges that seeks to advance digital opportunity for underserved communities by promoting and enabling digital competency. Among other services, the program provides laptops to community college students, who in return teach people in their communities how to use computers and the Internet. The program also provides free classes in low-income Central Valley communities. The California Connects team at ICSI provides research support for the initiative, evaluating the program's structure and effectiveness in the context of its target population and making recommendations for its future.

Previous Work: BFOIT

BFOIT (the Berkeley Foundation for Opportunities in Information Technology) supports historically underrepresented ethnic minorities and women in their desire to become leaders in the fields of computer science, engineering, and information technology. The intent is to provide youth with knowledge, resources, practical programming skills, and guidance in their pursuit of higher education and production of technology. For more information, visit the BFOIT Web site.

Previous Work: Color, Language, and Thought

In 1978 The World Color Survey (WCS) collected color naming data in 110 unwritten languages from around the world. The ICSI WCS staff (Paul Kay and Richard Cook of ICSI, Terry Regier of University of Chicago) put these data into a single database, available to the scientific community. Several outside laboratories have already used this database for studies.