How the Brain Dials Up the Volume to Hear Someone in a Crowd
Our brains have a remarkable ability to pick out one voice from among many. Now, a team of Columbia University neuroengineers has uncovered the steps that take place in the brain to make this feat possible. Today’s discovery helps to solve a long-standing scientific question as to how the auditory cortex, the brain’s listening center, can decode and amplify one voice over others — at lightning-fast speeds. This new-found knowledge also stands to spur development of hearing-aid technologies and brain-computer interfaces that more closely resemble the brain.
These findings were reported today in Neuron.
“Our capacity to focus in on the person next to us at a cocktail party while eschewing the surrounding noise is extraordinary, but we understood so little about how it all works,” said Nima Mesgarani, PhD, the paper’s senior author and a principal investigator at Columbia’s Mortimer B. Zuckerman Mind Brain Behavior Institute. “Today’s study brings that much-needed understanding, which will prove critical to scientists and innovators working to improve speech and hearing technologies.”
The auditory cortex is the brain’s listening hub. The inner ear sends this brain region electrical signals that represent a jumble of sound waves from the external world. The auditory cortex must then pick out meaningful sounds from that jumble.
“Studying how the auditory cortex sorts out different sounds is like trying to figure out what is happening on a large lake — in which every boat, swimmer and fish is moving, and how quickly — by only having the patterns of ripples in the water as a guide,” said Dr. Mesgarani, who is also an associate professor of electrical engineering at Columbia Engineering.
Today’s paper builds on the team’s 2012 study showing that the human brain is selective about the sounds it hears. That study revealed that when a person listens to someone talking, their brain waves change to pick out features of the speaker’s voice and tune out other voices. The researchers wanted to understand how that happens within the anatomy of the auditory cortex.
“We’ve long known that areas of auditory cortex are arranged in a hierarchy, with increasingly complex decoding occurring at each stage, but we haven’t observed how the voice of a particular speaker is processed along this path,” said James O’Sullivan, PhD, the paper’s first author who completed this work while a postdoctoral researcher in the Mesgarani lab. “To understand this process, we needed to record the neural activity from the brain directly.”
The researchers were particularly interested in two parts of the auditory cortex’s hierarchy: Heschl’s gyrus (HG) and the superior temporal gyrus (STG). Information from the ear reaches HG first, passing through it and arriving at STG later.
To understand these brain regions, the researchers teamed up with neurosurgeons Ashesh Mehta, MD, PhD, Guy McKhann, MD, and Sameer Sheth, MD, PhD, neurologist Catherine Schevon, MD, PhD, as well as fellow co-authors Jose Herrero, PhD and Elliot Smith, PhD. Based at Columbia University Irving Medical Center and Northwell Health, these doctors treat epilepsy patients, some of whom must undergo regular brain surgeries. For this study, patients volunteered to listen to recordings of people speaking while Drs. Mesgarani and O’Sullivan monitored their brain waves via electrodes implanted in the patients’ HG or STG regions.
The electrodes allowed the team to identify a clear distinction between the two brain areas’ roles in interpreting sounds. The data showed that HG creates a rich and multi-dimensional representation of the sound mixture, whereby each speaker is separated by differences in frequency. This region showed no preference for one voice or another. However, the data gathered from STG told a distinctly different story.
“We found that that it’s possible to amplify one speaker’s voice or the other by correctly weighting the output signal coming from HG. Based on our recordings, it’s plausible that the STG region performs that weighting,” said Dr. O’Sullivan.
Taken together, these findings reveal a clear division of duties between these two areas of auditory cortex: HG represents, while STG selects. It all happens in around 150 milliseconds, which seems instantaneous to a listener.
The researchers also found an additional role for STG. After selection, STG formed an auditory object, a representation of the sound that is analogous to our mental representations of the objects we see with our eyes. This demonstrates that even when a voice is obscured by another speaker — such as when two people talk over each other — STG can still represent the desired speaker as a unified whole that is unaffected by the volume of the competing voice.
The information gleaned here could be used as the basis for algorithms that replicate this biological process artificially, such as in hearing aids. Earlier this year Dr. Mesgarani and his team announced the development of a brain-controlled hearing aid, which utilizes one such algorithm to amplify the sounds of one speaker over another.
The researchers plan to study HG and STG activity in increasingly complex scenarios that have more speakers or include visual cues. These efforts will help to create a detailed and precise picture of how each area of the auditory cortex operates.
“Our end goal is to better understand how the brain enables us to hear so well, as well as create technologies that help people — whether it’s so stroke survivors can speak to their loved ones, or so the hearing impaired can converse more easily in a crowded party,” said Dr. Mesgarani. “And today’s study is a critical way point along that path.”
This paper is titled “Hierarchical encoding of attended auditory objects in multi-talker speech perception.” This research was supported by the National Institutes of Health (NIDCD-DC014279, S10 OD018211), The Pew Charitable Trusts and Pew Biomedical Scholars Program. The authors report no financial or other conflicts of interest.
— Anne Holden, Zuckerman Institute