How the brain distinguishes between voice and sound

By Petra Wiesmayer

What happens when people listen to something? How do they distinguish between sounds and spoken words? Speech is distinguished by two characteristics: the voice of the speaker and the linguistic content itself, including the sound of speech. Researchers at the University of Geneva (UNIGE) and the University of Maastricht have now demonstrated that the brain, the auditory cortex, adapts to what a person wants to hear. It focuses either on the voice of a speaker or on the sound of speech.

In order to find out how these cerebral mechanisms of speech processing work, the researchers developed pseudo words, words without any meaning, which were spoken by three separate voices with different vocal ranges. They wanted to see how the brain processes this information when it is concentrated on either the sound of the voice or the speech. They discovered that the auditory cortex amplifies different aspects of the sounds, depending on the intention. For the differentiation of voices, the main focus was on processing voice-specific information, while phoneme-specific information was important for the differentiation of speech sounds.

Pseudo-words

“We created 120 pseudo-words that comply with the phonology of the French language yet make no sense, so as to make sure that semantic processing would not interfere with the pure perception of the phonemes,” explains Narly Golestani, professor within the psychology section at UNIGE’s Faculty of Psychology and Educational Sciences (FPSE). These pseudo-words all contained phonemes such as /p/, /t/ or /k/, as in /preperibion/, /gabratade/ and /ecalimacre/.

The pseudo-words were spoken by a female phonetician. The scientists then converted the voice into different, deeper and higher voices. “In order to make the voice differentiations as difficult as the speech sound differentiations, we recreated the sound of three different voices from the recorded stimuli, rather than recording three actual different people,” Sanne Rutten explains, researcher with the Psychology Section of the FPSE of the UNIGE.

“Spectral Modulations”

During the study, the researchers recorded the brain activities of the participants using functional Magnetic Resonance Imaging (fMRI) with a high magnetic field (7 Tesla). Using this method, brain activity can be observed by measuring the blood oxygenation in the brain: the more oxygen that is required, the more that particular area of the brain is used.

Support Us!

Analysis of the main acoustic parameters of the underlying differences in the voices (speakers), and in the speech sounds (phonemes) of the pseudo-words themselves: high spectral modulations best differentiate the voices (blue spectral profile), and fast temporal modulations (red temporal profile) along with low spectral modulations (red spectral profile) which best differentiate the speech sounds. At the bottom: Analysis of neural, fMRI data: while performing the voice task, the auditory cortex amplifies higher spectral modulations (blue spectral profile), and while performing the phoneme task, it amplifies fast temporal modulations (red temporal profile) and low spectral modulations (red spectral profile). These amplification profiles are extremely similar to the acoustic profiles which differentiate between the voices and the phonemes. © UNIGE

During one scanning session, the participants heard the pseudo-words and were asked to identify the phonemes /p/,/t/t/ or /k/. In another session, they were expected to say whether the pseudo-words were read by voice 1, 2, or 3. During the evaluation, the scientists first analyzed the pseudo-words. They examined differences in frequency (high/low), temporal modulation (how quickly the sounds change over time), and spectral modulation (how the energy is spread across various frequencies). They found that high spectral modulations were best at differentiating between the voices. Fast temporal modulations paired with low spectral modulations were best at differentiating between the phonemes.

Different reactions of the brain

In the next step, the researchers used computer models to analyze the fMRI reactions, i.e. the brain activity in the auditory cortex while it was busy with processing the sounds during the two tasks. “The results show large similarities between the task information in the sounds themselves and the fMRI neural data,” says Golestani. To be precise, the auditory cortex amplified the higher spectral modulations when the participants had to concentrate on the voices. As for the phonemes, the cortex responded better to the fast temporal modulations and to the low spectral modulations.

“This is the first time that it’s been demonstrated in humans while using non-invasive methods, that the brain adapts to the task at hand in a manner that’s consistent with the acoustic information that accompanies speech sounds,” Rutten points out. “This will be useful in our future research, especially with regard to processing other levels of language – including semantics, syntax and prosody, topics that we plan to explore in the context of a National Centre of Competence in research on the origin and future of language.”

The results of the study were published in the journal Nature Human Behaviour.

More articles about the brain can be found here.