EECS 225d Guest Lecture: Speech Synthesis

Kim Silverman

Apple

Monday, April 16, 2012
4:00 - 5:30 p.m.

Kim Silverman, "Speech Synthesis," available at http://youtu.be/7mjh0PSUv0M
Watch this talk on YouTube

Conversion of text to speech requires processing at many levels of representation. This presentation will systematically step through text normalization, named entity extraction, part-of-speech tagging, phrasing, topic tracking, pronunciation, intonation, duration, phonology, phonetics, and signal representation. Examples of each stage, the difficulties encountered, and some typical approaches will be illustrated. This will provide a solid background for students to evaluate recent investigative approaches to HMM synthesis, and to modeling of speaker emotion.

Bio:
Kim Silverman is a principal research scientist at Apple, where among other things he led the development of the Alex text-to-speech synthesis system that is the flagship American English voice in OS X. He is first author on the ToBI standard for transcribing speech prosody. His PhD in computational modeling of intonation is from Cambridge University. His post-doctoral research was at Bell Labs, where he rewrote the intonation subsystem for the AT&T speech synthesizer. His publications and patents span human speech production and perception; speech synthesis and recognition; speaker authentication; and human-computer interaction