Audiovisual interactions during speech perception: A whole-head MEG study

Hertrich, I.¹, Mathiak, K.², Ackermann, H.¹, and Lutzenberger, W.²
¹Department of Neurology, University of Tübingen, Germany; ²MEG Center, University of Tübingen, Germany
E-mail: ingo.hertrich@uni-tuebingen.de

The perceptual fusion of visually-presented phonetic features into the auditory speech processing stream has been described as the "McGurk effect". Magnetoencephalography was used to assess the time course and hemispheric asymmetry of speech-related audiovisual interactions. Video recordings of the syllables /pa/ and /ta/ were synchronized with the synthetic syllables /pa/ and /ta/ each. Using an oddball design, audiovisually congruent syllables served as frequent stimuli, rare events deviating either in the visual or in the acoustic domain. As a non-speech control condition, the same videos were synchronized with two non-speech complex tone signals differing in pitch. In order to direct attention to the auditory channel, a behavioral response was required when subjects did not perceive the frequent syllable or tone. As expected, acoustic deviants elicited a mismatch response peaking ca. 180 ms post acoustic stimulus onset that could be modeled by a bilateral dipole structure within the region of the auditory cortex. Visual deviants elicited a later mismatch response (220 ms) irrespective of the acoustic signal representing speech or non-speech, the magnetic source being located within the visual rather than the auditory cortex. Furthermore, subspace projection of the responses to visual deviants onto the 'auditory' mismatch dipoles yielded a significant speech/nonspeech x hemisphere interaction, the speech condition giving rise to left-pronounced MEG responses with a latency of ca. 280 ms. These findings indicate that the phonological operations underlying the McGurk effect occur relatively late during perceptual processing and do not coincide with sensory memory operations reflected in the auditory mismatch response.