Speech Processing in the Auditory Pathway

Werner Hemmert, Benedikt Grothe

Project summary

One of the most critical steps in encoding sound for neuronal processing occurs when the analog pressure wave is coded into discrete nerve-action potentials. Recent pool models of the inner hair cell synapse do not reproduce the dead time period after an intense stimulus, so we used visual inspection and automatic speech recognition (ASR) to investigate a model of enhanced offset adaptation. We found that offset adaptation improved phase locking in the auditory nerve and raised ASR accuracy for features derived from auditory nerve fibers (ANFs). We also found that offset adaptation is crucial for auditory processing by onset neurons (ONs) in the next neuronal stage, the auditory brainstem. A second important finding was that multi-layer perceptrons (MLPs) performed much better than standard Gaussian mixture models (GMMs) for both our ANF-based and ON-based auditory features. As MLPs are also very easy to use in a multi-stream approach, they will facilitate combining features derived from different groups of neurons, which we hope to exploit in the next steps. The electrophysiological experiments in this project found first answers on how our auditory system cope with the fact that, under natural conditions, the statistical properties of the sound entering the ear change dramatically over time (for example, due to changes in intensity). We showed that a model with an intensity-dependent STRF could predict responses to stimuli with varying intensity. Despite the complexity of auditory feature selectivity in the IC, our results provided encouraging evidence that modeling nonlinear responses to complex stimuli is a tractable problem.

Offset adaptation enhances speech recognition scores considerably
by 12.5% (HTK testbed) and 19.6% (MLP testbed), respectively.
The MLP greatly improves interfacing of auditory-based features to the
conventional ASR backend: The relative improvement in word error rate
was 27.2% (no OA) and 33.1% (with OA), respectively.

Related Publications

Publications in conference proceedings (note that contributions to ISCA conferences are peer-reviewed and have a high rejection rate)

N. A. Lesica and B. Grothe (2008) Dynamic Spectrotemporal Feature Selectivity in the Auditory Midbrain. J. Neurosci. 28, 5412-5421.

N. A. Lesica and B. Grothe (2008). Efficient temporal processing of naturalistic sounds. PLoS One. 3(2):e1655.

Wang, H., Gelbart, D., Hirsch, H-G. and Hemmert, W. (2008): The Value of Auditory Offset Adaptation and Appropriate Acoustic Modeling. 9th Annual Conference of the International Speech Communication Association (Interspeech 2008), pp 902-905.

Hemmert, W. and Wang, H. (2008): Offset adaptation in the inner hair cell synapses is important for speech coding. Proc. 31st ARO Midwinter Meeting, #208.

Wang, H. and Hemmert, W. (2009): Coding of speech into nerve-action potentials. International Conference on Acoustics NAG/DAGA 2009, Rotterdam.