This week's paper selection (18-25 Nov 2015)

I have decided to post once a week the list of papers I have noticed.

I have not necessarily read these papers yet, so do not take these as recommendations, but simply papers I am curious to read. Some are not recent papers, in which case I mention the year of publication.


Some propositions for future spatial hearing research (III) - The coding problem

In the previous posts, I have proposed that we should look at the ecological problem of sound localization, and that in terms of physiology we should go beyond tuning curves. However, if all of this is addressed, we still have a big problem. We are looking for “neural representations” or “codes”, but neural representations are observer-centric concepts that make little sense from the viewpoint of the organism, as I have discussed a few times before (for example there). Neural responses are not there to be read by some little homunculus, they are just neurons that are exciting other neurons, which you are not recording. Those other neurons are not “reading the code”, you are. Those neurons are just reacting instantly to the electrical stimulation of the neurons that constitute what we like to call “neural representation”.

Not everyone is receptive to the philosophical points, so let me just give one example. You could look at the reponses of lots of binaural neurons and realize they have lots of different tunings. So you could suggest: maybe sound location is represented by the most active neurons. But someone else realizes that the average response of all those neurons varies gradually with sound location, so maybe sound location is actually encoded in the average response? Wait a minute: why throw all this information away? maybe sound location is represented by the entire pattern of activity? The problem we are facing here is not that we don't know how to determine which one is true, but rather that all of these are true (the last one being trivially true). Yes, sound location is represented by the identity of most active neurons, the average response and the pattern of activity: there is a mapping between sound location and those different features. That is, you, the external observer, can look at those features and guess what the sound location was. What is this supposed to prove?

By focusing on neural representations, we are not looking at the right problem. What we want to know in the end is not so much how neural activity varies with various parameters of an experiment, but how neural activity constitutes the spatial percept, or perhaps more modestly, how it drives behavioral orientation responses. Now certainly looking at neural responses is a necessary first step, but we can't answer the interesting question if we stop there. So how can we answer the interesting question?

Well, I won't claim that I have a good answer because I think that's one of the major conceptual problems in systems neuroscience today. But one proposition that I think goes in the right direction is to do stimulations instead of, or in conjunction with recordings. Ideally, those stimulations should be such as to trigger behaviors. Is average activity the important feature? Stimulate neurons at different places and you should see the same orientation response. Is the identity of active neurons important? With the same experiment, you should see different responses, varying systematically with stimulated neurons.

It's possible: it has actually been done 45 years ago (Syka and Straschill, 1970). Electrical stimulation of the inferior colliculus with microelectrodes can trigger specific orienting responses. These days one could also probably do optogenetic stimulation. It's not going to be simple, but I think it's worth it.

Some propositions for future spatial hearing research (II) - Tuning curves

In the previous post, I proposed to look at the ecological problem of sound localization, rather than the artificial and computationally trivial problem that is generally addressed. As regards physiology, this means that a neural representation of sound location is a property of collective neural responses that is unchanged for the class of stimuli that produce the same spatial percept. This is not a property that you will find at a single neuron level. To give a sense of what kind of property I am talking about, consider the Jeffress model, a classic model of sound localization. It goes as follows: each neuron is tuned to a particular location, and there are a bunch of neurons with different tunings. When a sound is presented, you identify the most active neuron, and that tells you where the sound comes from. If it is the same neuron that is most active for different sounds coming from the same location, then you have the kind of representation I am talking about: maximally active neuron is a representation of (specifically) sound location.

The Jeffress model actually has this kind of nice property (unlike competitors), but only when you see it as a signal processing model (cross-correlation) applied to an idealized acoustical situation where you have no head (ie two mics with just air between them). What we pointed out in a recent paper in eLife is that it loses that property when you consider sound diffraction introduced by the head; quite intriguingly, it seems that binaural neurons actually compensate for that (ie their tunings are frequency-dependent in the same way as interaural time differences are frequency-dependent).

But I want to discuss a more fundamental point that has to do with tuning curves. By “tuning curve”, I am referring to a measurement of how the firing rate of a neuron varies when one stimulus dimension is varied. Suppose that indeed you do have neurons that are tuned to different sound locations. Then you present a stimulus (of the same kind) and you look for the maximally active neuron. The tuning of that neuron should match the location of the presented stimulus. Right? Well, actually no. At least not in principle. That would be true if all tuning curves had exactly the same shape and peak value and only differed by a translation, or at least if the shape and magnitude were not correlated with tuning. But otherwise it's just an incorrect inference. If you don't see what I mean look at this paper on auditory nerve responses. Usually one would show selectivity curves of auditory nerve fibers, ie firing rate vs. sound frequency for a bunch of fibers (note that auditory scientists also use “tuning curve” to mean something else, which is minimum sound level that elicits a response vs. frequency). Here the authors show the data differently on Fig. 1: responses of all fibers along the cochlea for a bunch of frequencies. I bet that it is not what you would expect from reading textbooks on hearing. Individually, fibers are tuned to frequency. Yet you can't really pick the most active fiber and tell what sound frequency was presented. Actually there are different frequencies at which the response peaks at the same place. It's basically a mess. But that is what the auditory gets when you present a sound: the response of the entire cochlea for one sound, not the response of one neuron to lots of different stimuli.

So, what about sound localization and binaural neurons, do we have this kind of problem or not? Well I don't know for sure because no one actually shows whether the shape of tuning curves vary systematically with tuning or not. Most of the time, one shows a few normalized responses and then extracts a couple of features of the tuning curves for each cell (ie the tuning in frequency and ITD) and shows some trends. The problem is we can't infer the population response from tunings unless we know quite precisely how the tuning curves depend on tuning. That is particularly problematic when tuning curves are broad, which is the case for the rodents used in many physiological studies.

I see two ways to solve this problem. One is to prove that there is no problem. You look at tuning curves, and you show that there is no correlation between tuning and any other characteristic of tuning curves (for examples, calculate average tuning curves with the same tuning, and compare across tunings). That would be quite reassuring. My intuition: that will work in high frequency, maybe, or in the barn owl perhaps (quite narrow curves), but not in low frequency, and not for most cells in rodents (guinea pigs and gerbils).

If it doesn't work and there are correlations, then the problem will get quite complicated. You could think of looking for a parametric representation of the responses. It's a possibility and one might make some progress this way, but it might become quite difficult to do when you add extra stimulus dimensions (level etc). There is also the issue of gathering data from several animals, which will introduce extra variability.

The only clean way I see of dealing with this problem is to actually record the entire population response (or a large part of the structure). It sounds very challenging, but large-scale recording techniques are really progressing quite fast these days. Very dense electrode arrays, various types of imaging techniques; it's difficult but probably possible at some point.

Some propositions for future spatial hearing research (I) – The ecological situation and the computational problem

In these few posts, I will be describing my personal view of the kind of developments I would like to see in spatial hearing research. You might wonder: if this is any good, then why would I put it on my blog rather than in a grant proposal? Well, I have hesitated for a while but there are only so many things you can do in your life, and in the end I would just be glad if someone would pick up some of these ideas and made some progress in an interesting direction. Some of them are pretty demanding both in terms of efforts and expertise, which is also a reason why I am not likely to pursue all of these myself. And finally I believe in open science, and it would be interesting to read some comments or have some discussions. All this being said, I am open to collaboration on these subjects if one is motivated enough.

The basic question is: how do we (or animals) localize sounds in space? (this does not cover all of spatial hearing)

My personal feeling is that the field has made some real progress on this question but has now exploited all there is to exploit in the current approaches. In a nutshell, those approaches are: consider a restricted set of lab stimuli, typically a set of sounds that are varied in one spatial dimension, and look at how physiological and behavioral responses change when you vary that spatial parameter (the “coding” approach).

Let us start with what I think is the most fundamental point: the stimuli. For practical reasons, scientists want to use nice clean reproducible sounds in their experiments, for example tones and bursts of white noise. There are very good reasons for that. One is that if you want to make your results reproducible by your peers, then it's simpler to write that you used a 70 dB pure tone of frequency 1000 Hz than the sound of a mouse scratching the ground, even though the latter is clearly a more ecologically relevant sound for a cat. Another reason is that you want a clean, non noisy signal both for reproducibility reasons and because you don't want to do lots of experiments. Finally, you typically vary just one stimulus parameter (e.g. azimuthal angle of the source) because that already makes a lot of experiments.

All of this is very sensible, but it means that in terms of the computational task of localizing a sound, we are actually looking at a really trivial task. Think about it as if you were to design a sound localization algorithm. Suppose all sounds are going to be picked up from a set of tones that vary along a spatial dimension, say azimuth. How would you do it? I will tell you how I would do it: measure the average intensity at the left ear, and use a table to map it to sound direction. Works perfectly. Obviously that's not what actual signal processing techniques do, and probably that's not what the auditory system does. Why not? Because in real life, you have confounding factors. With my algorithm, you would think loud sounds come from the left and soft sounds from the right. Not a good algorithm. The difficulty of the sound localization problem is precisely to locate sounds despite all the possible confounding factors, ie all the non-spatial properties of sounds. There are many of them: level, spectrum, envelope, duration, source size, source directivity, early reflections, reverberation, noise, etc. That's why it's actually hard and algorithms are not that good in ecological conditions. That is the ecological problem, but there is actually very little research on it (in biology). As I argued in two papers (one about the general problem and one applied to binaural neurons), the problem that is generally addressed is not the ecological problem of sound localization, but the problem of sensitivity to sound location, a much simpler problem.

This state of affairs is very problematic in my opinion when it comes to understanding “neural representations” of sound location, or more generally, how the auditory system deals with sound location. For example, many studies have looked at the information content of neural responses and connected it with behavioral measurements. There are claims such as: this neuron's firing contains as much information about sound location as the entire organism. Other studies have claimed to have identified optimal codes for sound location, all based on the non-ecological approach I have just described. Sorry to be blunt, but: this is nonsense. Such claims would have been meaningful if we actually lived in a world of entirely identical sounds coming from different directions. And so in that world my little algorithm based on left ear intensity would probably be optimal. But we don't live in that world, and I would still not use the left-ear algorithm even if I encountered one of those sounds. I would use the algorithm that works in general, and not care so much about algorithms that are optimal for imaginary worlds.

What do we mean when we say that “neurons encode sound location”? Certainly we can't mean that neurons responses are sensitive to location, ie they vary when you vary sound location, because that would be true of basically all neurons that respond to sounds. If this is what we mean, then we are just saying that a sizeable portion of the brain is sensitive to auditory stimuli. Not that interesting. I think we mean, or at least we should mean, that neurons encode sound location specifically, that is, there is something in the collective response of the neurons that varies with sound location and not with other things. This something is the “representation”, and its most basic property is that it does not change if the sound location percept does not change. Unfortunately that property cannot be assessed if all you ever vary in your stimulus is the spatial dimension, and so in a nutshell: current approaches based on restricted stimulus sets cannot, by construction, address the question of neural representations of sound location. They address the question of sensitivity – a prerequisite, but really quite far from the actual ecological problem.

So I think the first thing to do would be to start actually addressing the ecological problem. This means essentially inverting the current paradigm: instead of looking at how responses (physiological/behavioral) change when a spatial dimension is varied, look at how they change (or at what doesn't change) when non-spatial dimensions are varied. I would proceed in 3 steps:

1) Acoustics. First of all, what are the ecological signals? Perhaps surprisingly, no one has measured that systematically (as far as I know). That is, for an actual physical source at a given location, not in a lab (say in a quiet field, to simplify things), how do the binaural signals look like? What is the structure of noise? How do the signals vary over repetititions, or if you use a different source? One would need to do lots of recordings with different source sources and different acoustic configurations (we have started to do that a little bit in the lab). Then we would start to have a reasonable idea of what the sound localization problem really is.

2) Behavior. The ecological problem of sound localization is difficult, but are we actually good at it? So far, I have not seen this question addressed in the previous literature. Usually, there is a restricted set of sounds, with high signal-to-noise ratio, often noises or clicks. So actually, we don't know how good we (or animals) are at localizing sounds in ecological situations. Animal behavior experiments are difficult, but a lot could be done with humans. There is some psychophysical research that tends to show that humans are generally not too much affected by confounding factors (eg level); it's a good starting point.

3) Physiology. As mentioned above, the point is to identify what in neural responses is specifically about sound location (or more precisely, perceived sound location), as opposed to other things. That implies to vary not only the spatial dimension but also other dimensions. That's a problem because you need more experiments, but you could start with one non-spatial dimension that is particularly salient. There is another problem, which is that you are looking for stable properties of neuron responses, but it's unlikely that you find that in one or a few neurons. So probably, you would need to record from many neurons (next post), and this gets quite challenging.

Next post is a criticism of tuning curves; and I'll end on stimulating vs. recording.


Update (6 Jan 2021): I am sharing a grant proposal on this subject. I am unlikely to do it myself, so feel free to reuse the ideas. I am happy to help if useful.