Neural coding of continuous speech in auditory cortex during monaural and dichotic listening

TLDR

The cortical representation of acoustic features of continuous speech underlies speech perception. The study used noninvasive magnetoencephalography to record human subjects listening to spoken narratives in both simple and cocktail‑party‑like auditory scenes. The authors show that slow temporal modulations of speech are encoded bilaterally in auditory cortex by a phase‑locked temporal code, with monaural responses more faithful in the right hemisphere and shorter latency contralaterally, while dichotic listening preserves but weakens and delays this encoding; moreover, the early sensory response to attended speech is markedly stronger than to unattended speech, evidencing top‑down attentional gain that is present from the first exposure and largely independent of content, thereby characterizing spectrotemporal speech encoding and providing a single‑trial paradigm for studying the cocktail‑party effect.

Abstract

The cortical representation of the acoustic features of continuous speech is the foundation of speech perception. In this study, noninvasive magnetoencephalography (MEG) recordings are obtained from human subjects actively listening to spoken narratives, in both simple and cocktail party-like auditory scenes. By modeling how acoustic features of speech are encoded in ongoing MEG activity as a spectrotemporal response function, we demonstrate that the slow temporal modulations of speech in a broad spectral region are represented bilaterally in auditory cortex by a phase-locked temporal code. For speech presented monaurally to either ear, this phase-locked response is always more faithful in the right hemisphere, but with a shorter latency in the hemisphere contralateral to the stimulated ear. When different spoken narratives are presented to each ear simultaneously (dichotic listening), the resulting cortical neural activity precisely encodes the acoustic features of both of the spoken narratives, but slightly weakened and delayed compared with the monaural response. Critically, the early sensory response to the attended speech is considerably stronger than that to the unattended speech, demonstrating top-down attentional gain control. This attentional gain is substantial even during the subjects' very first exposure to the speech mixture and therefore largely independent of knowledge of the speech content. Together, these findings characterize how the spectrotemporal features of speech are encoded in human auditory cortex and establish a single-trial-based paradigm to study the neural basis underlying the cocktail party phenomenon.

References

Page 1

	Year	Citations

Page 1