Recording and Translating the World’s Unwritten Languages

Steven Bird

University of Melbourne

Tuesday, October 28, 2014
12:30 p.m., Conference Roon 5A

This talk will report progress on a new approach to preserving unwritten languages. To enable a language to be studied once it is no longer spoken we need about ten million words of narratives and conversations, the same order of magnitude that exists for classical Latin or ancient Greek. With community involvement and specialised software running on inexpensive smartphones, it is feasible to capture such quantities – about 1000 hours – along with some translations into related languages and English. The goal is to develop algorithms for collecting and deciphering this parallel data, integrating phonetic recognisers and translation technologies, and providing the underpinnings for a large-scale, computational approach to language preservation.

Bio:

Steven Bird is investigating scalable methods for bridging remnant speech communities with the digital philologists of the future. His other projects include the Natural Language Toolkit (NLTK) and the Open Language Archives Community (OLAC). He is an associate professor at the University of Melbourne, and a senior research associate at the University of Pennsylvania.