What is sound? (IX) Sound localization and vision

In this post, I want to come back on a remark I made in a previous post, on the relationship between vision and spatial hearing. It appears that my account of the comparative study of Heffner and Heffner (Heffner & Heffner, 1992) was not accurate. Their findings are in fact even more interesting than I thought. They find that sound localization acuity across mammalian species is best predicted not by visual acuity, but by the width of the field of best vision.

Before I comment on this result, I need to explain a few details. Sound localization acuity was measured behaviorally in a left/right discrimination task near the midline, with broadband sounds. The authors report this discrimination threshold for 23 mammalian species, from gerbils to elephants. They then try to relate this value to various other quantities: the largest interaural time difference (ITD), which is directly related to head size, visual acuity (highest angular density of retinal cells), whether the animals are predatory or preys, and field of best vision. The latter quantity is defined as the angular width of the retina in which angular cell density is at least 75% of the highest density. So this quantity is directly related to the inhomogeneity of cell density in the retina.

The results of the comparative study are not straightforward (I find). Let us consider a few hypotheses. One hypothesis goes as follows. Sound localization acuity is directly related to the temporal precision of firing of auditory nerve fibers. If this precision is similar for all mammals, then this should correspond to a constant ITD threshold. In terms of angular threshold, sound localization acuity should then be inversely proportional to the largest ITD, and to head size. The same reasoning would go for intensity differences. Philosophically speaking, this corresponds to the classical information-processing view of perception: there is information about sound direction in the ITD, as reflected in the relative timing of spikes, and so sound direction can be estimated with a precision that is directly related to the temporal precision of neural firing. As I have argued many times in this blog, the flaw in the information-processing view is that information is defined with respect to an external reference (sound direction), which is accessible for an external observer. Nothing in the spikes themselves is about space: why would a difference in timing between two specific neurons produce a percept of space? It turns out that, of all the quantities the authors looked at, largest ITD is actually the worst predictor of sound localization acuity. Once the effect of best field of vision is removed, it is essentially uncorrelated (Fig. 8).

A second hypothesis goes as follows. The auditory system can estimate the ITD of sounds, but to interpret this ITD as the angle of the sound source requires calibration (learning), and this calibration requires vision. Therefore, sound localization acuity is directly determined by visual acuity. At first sight, this could be compatible with the information processing view of perception. However, the sound localization threshold is determined in a left/right localization task near the midline, and in fact this task does not require calibration. Indeed, one only needs to know which of the two ITDs is larger. Therefore, in the information-processing view, sound localization acuity should still be related to the temporal precision of neural “coding”. To make this hypothesis compatible with the information-processing view requires an additional evolutionary argument, which goes as follows. The sound localization system is optimized for a different task, absolute (not relative) localization, which requires calibration with vision. Therefore the temporal precision of neural firing, or of the binaural system, should match the required precision for that task. The authors find again that, once the effect of best field of vision is removed, visual acuity is essentially uncorrelated with sound localization acuity (Fig. 8).

Another evolutionary hypothesis could be that sound localization acuity is tuned for the particular needs of the animal. So a predator, like a cat, would need a very accurate sound localization system to be able to find a prey that is hiding. A prey would probably not require such high accuracy to be able to escape from a predator. An animal that is neither a prey nor a predator, like an elephant, would also not need high accuracy. It turns out that the elephant has one of the lowest localization thresholds in all mammals. Again there is no significant correlation once the best field of vision is factored out.

In this study, it appears rather clearly that the single quantity that best predicts sound localization acuity is the width of the best field of vision. First of all, this goes against the common view of the interaction between vision and hearing. According to this view, the visual system localizes the sound source, and this estimation is used to calibrate the sound localization system. If this were right, we would rather expect that localization acuity corresponds to visual acuity.

In terms of function, the results suggest that sound localization is used by animals to move their eyes so that the source is in the field of best vision. There are different ways to interpret this. The authors seem to follow the information-processing view, with the evolutionary twist: sound localization acuity reflects the precision of the auditory system, but that precision is adapted for the function of sound localization. One difficulty with this interpretation is that the auditory system is also involved in many other tasks that are unrelated to sound localization, such as sound identification. Therefore, only the precision of the sound localization system should be tuned to the difficulty of the task, for example the size of the medial superior olive, which is involved in the processing of ITDs. However, when thinking of intensity rather than timing differences, this view seems to imply that the precision of encoding of monaural intensities should be tuned to the difficulty of the binaural task.

Another difficulty comes from studies of vision-deprived or blind animals. There are a few of them, which tend to show that sound localization acuity actually tends to get better. This could not occur if sound localization acuity reflected genetic limitations. The interpretation can be saved by replacing evolution by development. That is, the sound localization system is tuned during development to reach a precision appropriate for the needs of the animal. For a sighted animal, these needs would be moving the eyes to the source, but for a blind animal it could be different.

An alternative interpretation that rejects the information-processing view is to consider that the meaning of binaural cues (ITD, ILD) can only come from what they imply for the animal, independently of the “encoding” precision. For a sighted animal, observing a given ITD would imply that moving the eyes or the head by a specific angle would put a moving object in the best field of view. If perceiving direction is perceiving the movement that must be performed to put the source in the best field of view, then sound localization acuity should correspond to the width of that field. For a blind animal, the connection with vision disappears, and so binaural cues must acquire a different meaning. This could be, for example, the movements required to reach the source. In this case, sound localization acuity could well be better than for a sighted animal.

In more operational terms, learning the association between binaural cues and movements (of the eyes or head) requires a feedback signal. In the calibration view, this feedback is the error between the predicted retinal location of the sound source and the actual location, given by the visual system. Here the feedback signal would rather be something like the total amount of motion in the visual field, or its correlation with sound, a quantity that would maximized when the source is in the best field of vision. This feedback is more like a reward than a teacher signal.

Finally, I suggest a simple experiment to test this hypothesis. Gerbils have a rather homogeneous retina, with a best field of vision of 200°. Accordingly, sound localization threshold is large, about 27°. The hypothesis would predict that, if gerbils were raised with an optical system (glasses) that creates an artificial fovea (enlarge a central part of the visual field), then their sound localization acuity should improve. Conversely, for an animal with a small field of best vision (cats), using an optical system that magnifies the visual field should degrade sound localization acuity. Finally, in humans with corrected vision, there should be a correlation between the type of correction and sound localization acuity.

This discussion also raises two points I will try to address later:

- If sound localization acuity reflects visual factors, then it should not depend on properties of the sound, as long as there are no constraints in the acoustics themselves (e.g. a pure tone may provide ambiguous cues).

- If sound localization is about moving the eyes or the head, then how about the feeling of distance, and other aspects of spatial hearing?


Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *