Lecture 12 - Speech perception
Topics in this session
● How is speech signal decoded?
● How is speech signal segmented?
● Where is speech comprehended?
The challenge of acoustic variation
● Key factor in models of voice perception and speaker identification, because perceptual processes
must cope with variable input in order to achieve perceptual constancy
● Accents - English vs Scottish
○ “Glasgow is great!” - Same words sound very different
The challenge of speech segmentation
Spectrogram
Speech is a continuous signal that does not have punctuation or
spaces
e.g., “I scream” = “ice cream”
How does the brain extracts phonemes and words from this signal?
Solution1 - Categorical perceptions of phonemes?
● Phonemes - any of the perceptually distinct units of sound in a specified language that distinguish
one word from another, for example p, b, d, and t in the English words pad, pat, bad, and bat.
Infinitely varying acoustic signals mapped or ‘slotted into’ discrete categorical
representations
The motor theory of speech perception
● Phonemes are recognised by inferring the articulatory movements required to produce them
What’s the evidence?
The discovery of mirror neurons
● Some neurons in monkey’s pre motor cortex fire when executing and
observing actions - Mirror neurons
Mirror Neurons in the Broca’s Area Respond to Observing Mouth Movements
● In humans, “mirror neurons” in the Broca’s Area respond to lipreading
(observing mouth movements)
Speech representations could have motor properties
Paul Broca’s two stroke patients had left inferior frontal lesions- They can’t speak
“Broca’s Area” is important for speaking
, However, patients with Broca aphasia can understand speech
Patients with lesions in Broca’s Area
● They all have impaired speech production
BUT performance is normal in:
● Syllable-syllable matching (phonetic processing)e.g., /ba/-/ba/, /ba/-/pa/
● Word-picture matching (speech comprehension)
TMS pulses to Lips and Tongue areas modulate phoneme perceptions in noise
● Different phonemes are articulated differently
● Labial (Lab) Phonemes like /p/ in ‘pea’ rely on lip movements
● Dental (Dent) Phonemes like /θ/ in ‘three’ rely on tougue movements
Two short TMS pulses just before phoneme ‘warm up’ the lips or tongue motor
regions
Motor system contributes to ‘difficult’ speech perception only
100% indicates no change
Double association
McGurk effect
● Demonstrates Speech Perception is Multisensory
● Auditory-visual illusion that illustrates how perceivers merge information for speech sounds across
the senses.
● For example, when we hear the sound “ba” while seeing the face of a person articulate “ga,” many
adults perceive the sound “da,” a third sound which is a blend of the two.
McGurk illusion
● May Arise From Mirrored Silent Articulation
● How does the brain perceive “ba ba” as “da da” or “va va”?
● Lip movements of /ga ga/ -> mirror neurons activated to generate the motor signals of saying “ga
ga” -> the motor signals are converted to predict what “ga ga” sound like
○ Hear /ba/ /ba/ + predict /ga/ /ga/ (Auditory sensation ofhearing “ga ga”) = /da/ /da/
McGurk illusion: Motor region
● The Motor Region Is Most Active When Perceiving the McGurk Illusion
○ Auditory system is highly involved