Evidence for the Role of Cortical Theta Oscillations in Determining Auditory Channel Capacity for Speech

Oded Ghitza

Department of Biomedical Engineering and Hearing Research Center
Boston University

Tuesday, April 22
4:00 p.m., Lecture Hall

Studies on the intelligibility of time-compressed speech have shown flawless performance for moderate compression factors, a sharp deterioration for compression factors above three, and an improved performance as a result of “repackaging” – a process of dividing the time-compressed waveform into fragments, called packets, and delivering the packets in a prescribed rate. This intricate pattern of performance reflects the capability of the auditory system to process speech streams with different information transfer rates, set by the compression factor and the repackaging parameters; the knee-point of performance defines the auditory channel capacity. This study is concerned with the cortical computation principle that determines channel capacity. Oscillation-based models of speech perception hypothesize that the speech decoding process is guided by a cascade of oscillations with θ as “master,” capable of tracking the input rhythm, with the θ cycles aligned with the intervocalic speech fragments termed θ-syllables; intelligibility remains high as long as θ is in sync with the input, and it sharply deteriorates once θ is out of sync. In the study described here the hypothesized role of θ was examined by measuring the auditory channel capacity of time-compressed speech undergone repackaging. For all compression factors tested (up to eight), packaging rate at capacity equals 9 packets/sec – aligned with the upper limit of cortical θ, θmax (about 9 Hz) – and the packet duration equals the duration of one uncompressed θ-syllable divided by the compression factor. The alignment of both the packaging rate and the packet duration with properties of cortical θ suggests that the auditory channel capacity is determined by θ. Irrespective of speech speed, the maximum information transfer rate through the auditory channel is the information in one uncompressed θ-syllable long speech fragment per one θmax cycle. Equivalently, the auditory channel capacity is 9 θ-syllables/sec.

Bio:

Oded Ghitza received the BSc, MSc, and PhD degrees in electrical engineering from Tel-Aviv University, Israel, in 1975, 1977, and 1983, respectively. From 1968 to 1984 he was with the Signal Corps Research Laboratory of the Israeli Defense Forces. During 1984-1985 he was a Bantrell post-doctoral fellow at MIT, Cambridge, Massachusetts, and a consultant with the Speech Systems Technology Group at Lincoln Laboratory, Lexington, Massachusetts. From 1985 to early 2003 he was with the Acoustics and Speech Research Department, Bell Laboratories, Murray Hill, New Jersey, where his research was aimed at developing models of hearing and at creating perception-based signal analysis methods for speech recognition, coding, and evaluation. From 2003 to 2011, he was with Sensimetrics Corp., Malden, Massachusetts, where he continued to acquire and model basic knowledge of auditory physiology and of perception for the purpose of advancing speech, audio and hearing-aid technology. From 2005 to 2008 he was with the Sensory Communication Group at MIT. Since mid-2006 he has been with the Hearing Research Center and with the Center for Biodynamics at Boston University, where he studies the role of brain rhythms in speech perception. In December 2010 he was appointed a Research Professor at the Department of Biomedical Engineering, Boston University.