In these few posts, I will be describing my personal view of the kind of developments I would like to see in spatial hearing research. You might wonder: if this is any good, then why would I put it on my blog rather than in a grant proposal? Well, I have hesitated for a while but there are only so many things you can do in your life, and in the end I would just be glad if someone would pick up some of these ideas and made some progress in an interesting direction. Some of them are pretty demanding both in terms of efforts and expertise, which is also a reason why I am not likely to pursue all of these myself. And finally I believe in open science, and it would be interesting to read some comments or have some discussions. All this being said, I am open to collaboration on these subjects if one is motivated enough.
The basic question is: how do we (or animals) localize sounds in space? (this does not cover all of spatial hearing)
My personal feeling is that the field has made some real progress on this question but has now exploited all there is to exploit in the current approaches. In a nutshell, those approaches are: consider a restricted set of lab stimuli, typically a set of sounds that are varied in one spatial dimension, and look at how physiological and behavioral responses change when you vary that spatial parameter (the “coding” approach).
Let us start with what I think is the most fundamental point: the stimuli. For practical reasons, scientists want to use nice clean reproducible sounds in their experiments, for example tones and bursts of white noise. There are very good reasons for that. One is that if you want to make your results reproducible by your peers, then it's simpler to write that you used a 70 dB pure tone of frequency 1000 Hz than the sound of a mouse scratching the ground, even though the latter is clearly a more ecologically relevant sound for a cat. Another reason is that you want a clean, non noisy signal both for reproducibility reasons and because you don't want to do lots of experiments. Finally, you typically vary just one stimulus parameter (e.g. azimuthal angle of the source) because that already makes a lot of experiments.
All of this is very sensible, but it means that in terms of the computational task of localizing a sound, we are actually looking at a really trivial task. Think about it as if you were to design a sound localization algorithm. Suppose all sounds are going to be picked up from a set of tones that vary along a spatial dimension, say azimuth. How would you do it? I will tell you how I would do it: measure the average intensity at the left ear, and use a table to map it to sound direction. Works perfectly. Obviously that's not what actual signal processing techniques do, and probably that's not what the auditory system does. Why not? Because in real life, you have confounding factors. With my algorithm, you would think loud sounds come from the left and soft sounds from the right. Not a good algorithm. The difficulty of the sound localization problem is precisely to locate sounds despite all the possible confounding factors, ie all the non-spatial properties of sounds. There are many of them: level, spectrum, envelope, duration, source size, source directivity, early reflections, reverberation, noise, etc. That's why it's actually hard and algorithms are not that good in ecological conditions. That is the ecological problem, but there is actually very little research on it (in biology). As I argued in two papers (one about the general problem and one applied to binaural neurons), the problem that is generally addressed is not the ecological problem of sound localization, but the problem of sensitivity to sound location, a much simpler problem.
This state of affairs is very problematic in my opinion when it comes to understanding “neural representations” of sound location, or more generally, how the auditory system deals with sound location. For example, many studies have looked at the information content of neural responses and connected it with behavioral measurements. There are claims such as: this neuron's firing contains as much information about sound location as the entire organism. Other studies have claimed to have identified optimal codes for sound location, all based on the non-ecological approach I have just described. Sorry to be blunt, but: this is nonsense. Such claims would have been meaningful if we actually lived in a world of entirely identical sounds coming from different directions. And so in that world my little algorithm based on left ear intensity would probably be optimal. But we don't live in that world, and I would still not use the left-ear algorithm even if I encountered one of those sounds. I would use the algorithm that works in general, and not care so much about algorithms that are optimal for imaginary worlds.
What do we mean when we say that “neurons encode sound location”? Certainly we can't mean that neurons responses are sensitive to location, ie they vary when you vary sound location, because that would be true of basically all neurons that respond to sounds. If this is what we mean, then we are just saying that a sizeable portion of the brain is sensitive to auditory stimuli. Not that interesting. I think we mean, or at least we should mean, that neurons encode sound location specifically, that is, there is something in the collective response of the neurons that varies with sound location and not with other things. This something is the “representation”, and its most basic property is that it does not change if the sound location percept does not change. Unfortunately that property cannot be assessed if all you ever vary in your stimulus is the spatial dimension, and so in a nutshell: current approaches based on restricted stimulus sets cannot, by construction, address the question of neural representations of sound location. They address the question of sensitivity – a prerequisite, but really quite far from the actual ecological problem.
So I think the first thing to do would be to start actually addressing the ecological problem. This means essentially inverting the current paradigm: instead of looking at how responses (physiological/behavioral) change when a spatial dimension is varied, look at how they change (or at what doesn't change) when non-spatial dimensions are varied. I would proceed in 3 steps:
1) Acoustics. First of all, what are the ecological signals? Perhaps surprisingly, no one has measured that systematically (as far as I know). That is, for an actual physical source at a given location, not in a lab (say in a quiet field, to simplify things), how do the binaural signals look like? What is the structure of noise? How do the signals vary over repetititions, or if you use a different source? One would need to do lots of recordings with different source sources and different acoustic configurations (we have started to do that a little bit in the lab). Then we would start to have a reasonable idea of what the sound localization problem really is.
2) Behavior. The ecological problem of sound localization is difficult, but are we actually good at it? So far, I have not seen this question addressed in the previous literature. Usually, there is a restricted set of sounds, with high signal-to-noise ratio, often noises or clicks. So actually, we don't know how good we (or animals) are at localizing sounds in ecological situations. Animal behavior experiments are difficult, but a lot could be done with humans. There is some psychophysical research that tends to show that humans are generally not too much affected by confounding factors (eg level); it's a good starting point.
3) Physiology. As mentioned above, the point is to identify what in neural responses is specifically about sound location (or more precisely, perceived sound location), as opposed to other things. That implies to vary not only the spatial dimension but also other dimensions. That's a problem because you need more experiments, but you could start with one non-spatial dimension that is particularly salient. There is another problem, which is that you are looking for stable properties of neuron responses, but it's unlikely that you find that in one or a few neurons. So probably, you would need to record from many neurons (next post), and this gets quite challenging.
Next post is a criticism of tuning curves; and I'll end on stimulating vs. recording.
Update (6 Jan 2021): I am sharing a grant proposal on this subject. I am unlikely to do it myself, so feel free to reuse the ideas. I am happy to help if useful.
Ping : Project: Binaural cues and spatial hearing in ecological environments | Romain Brette
Thanks a lot for the article. How would you go about answering "how do the binaural signals look like?" Would you use some kind of binaural microphones for this?
Absolutely, binaural earphones on a human subject (or maybe Kemar) and real physical sources (sounding objects, speech).