In the previous post, I proposed to look at the ecological problem of sound localization, rather than the artificial and computationally trivial problem that is generally addressed. As regards physiology, this means that a neural representation of sound location is a property of collective neural responses that is unchanged for the class of stimuli that produce the same spatial percept. This is not a property that you will find at a single neuron level. To give a sense of what kind of property I am talking about, consider the Jeffress model, a classic model of sound localization. It goes as follows: each neuron is tuned to a particular location, and there are a bunch of neurons with different tunings. When a sound is presented, you identify the most active neuron, and that tells you where the sound comes from. If it is the same neuron that is most active for different sounds coming from the same location, then you have the kind of representation I am talking about: maximally active neuron is a representation of (specifically) sound location.
The Jeffress model actually has this kind of nice property (unlike competitors), but only when you see it as a signal processing model (cross-correlation) applied to an idealized acoustical situation where you have no head (ie two mics with just air between them). What we pointed out in a recent paper in eLife is that it loses that property when you consider sound diffraction introduced by the head; quite intriguingly, it seems that binaural neurons actually compensate for that (ie their tunings are frequency-dependent in the same way as interaural time differences are frequency-dependent).
But I want to discuss a more fundamental point that has to do with tuning curves. By “tuning curve”, I am referring to a measurement of how the firing rate of a neuron varies when one stimulus dimension is varied. Suppose that indeed you do have neurons that are tuned to different sound locations. Then you present a stimulus (of the same kind) and you look for the maximally active neuron. The tuning of that neuron should match the location of the presented stimulus. Right? Well, actually no. At least not in principle. That would be true if all tuning curves had exactly the same shape and peak value and only differed by a translation, or at least if the shape and magnitude were not correlated with tuning. But otherwise it's just an incorrect inference. If you don't see what I mean look at this paper on auditory nerve responses. Usually one would show selectivity curves of auditory nerve fibers, ie firing rate vs. sound frequency for a bunch of fibers (note that auditory scientists also use “tuning curve” to mean something else, which is minimum sound level that elicits a response vs. frequency). Here the authors show the data differently on Fig. 1: responses of all fibers along the cochlea for a bunch of frequencies. I bet that it is not what you would expect from reading textbooks on hearing. Individually, fibers are tuned to frequency. Yet you can't really pick the most active fiber and tell what sound frequency was presented. Actually there are different frequencies at which the response peaks at the same place. It's basically a mess. But that is what the auditory gets when you present a sound: the response of the entire cochlea for one sound, not the response of one neuron to lots of different stimuli.
So, what about sound localization and binaural neurons, do we have this kind of problem or not? Well I don't know for sure because no one actually shows whether the shape of tuning curves vary systematically with tuning or not. Most of the time, one shows a few normalized responses and then extracts a couple of features of the tuning curves for each cell (ie the tuning in frequency and ITD) and shows some trends. The problem is we can't infer the population response from tunings unless we know quite precisely how the tuning curves depend on tuning. That is particularly problematic when tuning curves are broad, which is the case for the rodents used in many physiological studies.
I see two ways to solve this problem. One is to prove that there is no problem. You look at tuning curves, and you show that there is no correlation between tuning and any other characteristic of tuning curves (for examples, calculate average tuning curves with the same tuning, and compare across tunings). That would be quite reassuring. My intuition: that will work in high frequency, maybe, or in the barn owl perhaps (quite narrow curves), but not in low frequency, and not for most cells in rodents (guinea pigs and gerbils).
If it doesn't work and there are correlations, then the problem will get quite complicated. You could think of looking for a parametric representation of the responses. It's a possibility and one might make some progress this way, but it might become quite difficult to do when you add extra stimulus dimensions (level etc). There is also the issue of gathering data from several animals, which will introduce extra variability.
The only clean way I see of dealing with this problem is to actually record the entire population response (or a large part of the structure). It sounds very challenging, but large-scale recording techniques are really progressing quite fast these days. Very dense electrode arrays, various types of imaging techniques; it's difficult but probably possible at some point.