Is perception about inference?

One philosophical theory about perception claims that perceiving is inferring the external world from the sensory signals. The argumentation goes as follows. Consider the retina: there is a spatially inhomogeneous set of photoreceptors; the image projected onto the retina is inverted, but you don’t see the world upside down; there are blood vessels that you normally don’t see; there is a blind spot where the optic nerve starts that you normally don’t notice; color properties of photoreceptors and their spatial sampling is inhomogeneous and yet color doesn’t change when you move the eyes. Perceptually, the visual field seems homogeneous and independent of the position of the eyes, apart from a change of perspective. So certainly what you perceive is not the raw sensations coming from your photoreceptors. These raw sensations are indirectly produced by things in the world, which have some constancy (compared to eye movements, for example). The visual signals in your retina are not constant, but somehow your perception is constant. Therefore, so the argument goes, your mind must be reconstructing the external world from the sensory signals, and what you perceive is this reconstruction.

Secondly, visual signals are ambiguous. A classical example is the Necker cube: a wire frame cube drawn in isometric perspective on a piece of paper, which can be perceived in two different ways. More generally, the three-dimensional world is projected on your retina as a two-dimensional image, and yet we see in three dimensions: the full 3D shape of objects must then be inferred. Another example is that in the dark, visual signals are noisy and yet you can see the world, although less clearly, and you don’t see noise.

I would then like to consider the following question: why, when I am looking at an apple, do I not see the back of the apple?

The answer is so obvious that the question sounds silly. Obviously, there is no light going through the object to our eyes, so how come could we see anything behind it? Well precisely, the inference view claims that we perceive things that are not present in the sensory signals but inferred from them. In the case of the Necker cube, there is nothing in the image itself that informs us of the true three-dimensional shape of the cube; there are just two consistent possibilities. But in the same way, when we see an apple, there are a number of plausible possibilities about how the back of the apple should be, and yet we only see the front of the apple. Certainly we see an apple, and we can guess how the back of the apple looks like, but we do not perceive it. A counter-argument would be that inference about the world is partial: of course we cannot infer what is visually occluded by an object. But this is circularly reasoning: perception is the result of inference, but we only infer what can be perceived.

One line of criticism of criticism of the objectivist/inferential view starts from Kant’s remark that anything we can ever experience comes from our senses, and therefore one cannot experience the objective world as such, even through inference, since we have never had access to the things to be inferred. This leads to James Gibson’s ecological theory of perception, who considered that the (phenomenal) world is directly perceived as the invariant structure in the sensory signals (the laws that the signals follow, potentially including self-generated movements). This view is appealing in many respects because it solves the problem raised by Kant (who concluded that there must be an innate notion of space). But it does not account for the examples that motivate the inferential view, such as the Necker cube (or in fact the perception of drawings in general). A related view, O’Regan’s sensorimotor theory of perception, also considers that objects of perception must be defined in terms of relationships between signals (including motor signals) but does not reject the possibility of inference. Simply, what is to be inferred is not an external description of the world but the effect of actions of sensory signals.

So some of the problems of the objectivist inferential view can be solved by redefining what is to be inferred. However, it still remains that in an inferential process, the result of inference is in a sense always greater than its premises: there is more than is directly implied by the current sensory signals. For example, if I infer that there is an apple, I can have some expectations about how the apple should look like if I turn it, and I may be wrong. But this part where I may be wrong, the predictions that I haven’t checked, I actually don’t see it – I can imagine it, perhaps.

Therefore, perception cannot be the result of inference. I suggest that perception involves two processes: 1) an inferential process, which consists in making a hypothesis about sensory signals and their relationship with action; 2) a testing process, in which the hypothesis is tested against sensory signals, possibly involving an action (e.g. an eye movement). These two processes can be seen as coupled, since new sensory signals are produced by the second process. I suggest that it is the second process (which is conditioned by the first one) that gives rise to conscious perception. In other words, to perceive is to check a hypothesis about the senses (possibly involving action). According to this proposition, subliminal perception is possible. That is, a hypothesis may be formed with insufficient time to test it. In this case, the stimulus is not perceived. But it may still influence the way subsequent stimuli are perceived, by influencing future hypotheses or tests.

Update. In The world as an outside memory, Kevin O'Regan expressed a similar view: "It is the act of looking that makes things visible".

What is sound? (X) What is loudness?

At first sight, it seems obvious what loudness is. A sound is loud when the acoustical wave carries a lot of energy. But if we think about it in details, we quickly encounter difficulties. One obvious thing is that if we play the same sound at different levels, then clearly the feeling of loudness directly correlates with the amplitude of the sound, and therefore with the energy of the sound. But how about if we play two completely different sounds? Which one is louder? Should we consider the total energy? Probably not, because this would introduce a confusion with duration (the longer sound has more energy). So perhaps the average energy? But then what is the average energy of an impact sound, and how does it compare with a tone? Also, how about low sounds and high sounds, is there the same relationship between energy and loudness for both sounds? And does a sound feel as loud in a quiet environment as in a noisy environment? Does it depend on what sounds were played before?

I could go on indefinitely, but I have made the point that loudness is a complex concept, and its relationship with the acoustic signal is not straightforward at all.

Let us see what can be said about loudness. First of all, we can say that a sound is louder than another sound, even if the two sounds are completely different. This may not be true of all pairs of sounds, but certainly I can consider that a low amplitude tone is weak compared to the sound made by a glass breaking on the floor. So certainly there seems to be an order relationship in loudness, although perhaps partial. Also, it is true that scaling the acoustical wave has the effect of monotonically changing the loudness of the sound. So there is definitely a relationship with the amplitude, but only in that scaling sense: it is not determined by simple physical quantities such as the peak pressure or the total energy.

Now it is interesting to think for a while about the notion of a sound being “not loud enough” and of a sound being “too loud”, because it appears that these two phrases do not refer to the same concept. We say that a sound is “not loud enough” when we find it hard to hear, when it is difficult to make sense of it. For example we ask someone to speak louder. Thus this notion of loudness corresponds to intelligibility, rather than acoustical energy. In particular, this is a relative notion, in the sense that intelligibility depends on the acoustical environment – background noise, other sources, reverberation, etc.

But saying that a sound is “too loud” refers to a completely different concept. It means that the sound produces an uncomfortable feeling because of its intensity. This is unrelated to intelligibility: someone screaming may produce a sound that is “too loud”, but two people screaming would also produce a sound that is “too loud”, even though intelligibility decreases. Therefore, there are at least two different notions regarding loudness: a relative notion related to intelligibility, and a more absolute one related to an unpleasant or even painful feeling. Note that it can also be said that a sound is too loud in the sense of intelligibility. For example, it can be said that the TV is too loud because it makes it hard to understand someone speaking to us. So the notion of loudness is multiform, and therefore cannot be mapped to a single scale.

Loudness as in “not loud enough” (intelligibility) is rather simple to understand. If the signal-to-noise ratio is too low, then it is more difficult to extract the relevant information from the signal, and this is what is meant by “not loud enough”. Of course there are subtleties and the relationship between the acoustical signals and intelligibility is complex, but at least it is relatively clear what it is about. In contrast, it is not so straightforward what “too loud” means. Why would a sound be unpleasant because the acoustical pressure is large?

First of all, what does it mean that something is unpleasant or painful? Something unpleasant is something that we want to avoid. But this is not a complete characterization: it is not only a piece of information that is taken into account in decision making; it has the character of an uncontrollable feeling, something that we cannot avoid being subjected to. In other words, it is an emotion. Being controlled by this emotion means acting so as to escape the unpleasant sound, for example, by putting one’s hands on the ears. Consciously trying not to act in such a way would be considered as “resisting” this emotion. This terminology implies that loudness (as in “too loud”) is an involuntary avoidance reaction of the organism to sounds, one that implies attenuating the sounds. Therefore, loudness is not only about objective properties of the external world, but also about our biological self, or more precisely about the effect of sounds on our organism.

Why would a loud sound trigger an avoidance reaction? We can speculate on different possibilities.

1) A loud sound may indicate a threat. There is indeed a known reflex called “startle reflex”, with a latency of around 10 ms (Yeomans and Frankland, Brain Research Reviews 1996). In response to sudden unexpected loud sounds, there is an involuntary contraction of muscles, which stiffens in particular the neck during a brief period of time. The reflex is found in all mammals and involves a short pathway in the brainstem. It is also affected by previous sounds and emotional state. However, this reflex only involves a small subset of sounds, which are sudden and normally very loud (over 80 dB).

2) A very loud sound can damage the cochlea (destroy hair cells). At very high levels, it can even be painful. Note that a moderately loud sound can also damage the cochlea if it lasts long. Thus, the feeling of loudness could be related to the emotional reaction aimed at avoiding damage to the cochlea. Note that while cochlear damage depends on duration, loudness does not. That is, a continuous pure tone seems just as loud at the beginning as 1 minute into it, and yet because damage depends on continuous exposition, an avoidance reaction should be more urgent in the latter case than in the former case. Even for very loud sounds, the feeling of loudness does not seem to increase with time: it may seem more and more urgent to avoid the sound, but it does not feel louder. We can draw two conclusions: 1) the feeling of loudness, or of a sound being too loud, cannot correspond to an accurate biological measurement of potential cochlear damage, as it seems to have a feeling of constancy when the sound is stationary; 2) the feeling of a sound being “too loud” probably doesn’t correspond to the urgency of avoiding that sound, since this urgency can increase (emotionally) without a corresponding increase in loudness. It could be that the emotional content (“too loud”) comes in addition to the perceptual content (a certain degree of loudness), and that only the latter is constant for a stationary sound.

3) Another possibility is that loudness correlates with the energy consumption of the auditory periphery (possibly of the auditory system in general). Indeed when the amplitude of an acoustical wave is increased, the auditory nerve fibers and most neurons in the auditory system fire more. Brain metabolism is tightly regulated, and so it is not at all absurd to postulate that there are mechanisms to sense the energy consumption due to a sound. However, this is not a very satisfying explanation of why a sound would feel “too loud”. Indeed why would the organism feel an urge to avoid a sound because it incurs a large energy consumption, when there could be mechanisms to reduce that consumption?

In this post, I have addressed two aspects of loudness: intelligibility (“not loud enough”) and emotional content (“too loud”). These two aspects are “proximal”, in the sense that they are determined not so much by the sound source as by the acoustical wave at the ear. In the next post, I will consider distal aspects of loudness, that is, those aspects of loudness that are determined by the sound source.

Neural coding and the invariance problem

In sensory systems, one of the hardest computational problems is the “invariance problem”: the same perceptual category can be associated with a large diversity of sensory signals. A classical example is the problem of a recognizing a face: the same face can appear with different orientations relative to the observer, and under different lighting conditions, and it is a challenge to design a recognition system that is invariant to these sources of variation.

In computational neuroscience, the problem is usually framed within the paradigm of statistical learning theory as follows. Perceptual categories belong to some set Y (the set of faces). Sensory signals belong to some high-dimensional sensory space X (e.g. pixels). Each particular category (a particular face) corresponds to a specific set of signals in X (different views of the face) or to a distribution on X. The goal is to find the correct mapping from X to Y from particular labeled examples (a particular view x of a face, the name y corresponding to that face). This is also the view that underlies the “neural coding” paradigm, where there is a communication channel between Y and X, and X contains “information” about Y.

Framed in this way, this is a really difficult problem in general, and it requires many examples to form categories. However, there is a different way of approaching the problem, which follows from the concept of “invariant structure” developed by James Gibson. It starts with the observation that a sensory system does not receive a static input (an image) but rather a sensory flow. This is obvious in hearing (sounds are carried by acoustic waves, which vary in time), but it is also true of vision: the eyes are constantly moving even when fixating an object (e.g. high-frequency tremors). A perceptual system is looking for things that do not vary within this sensory flow, the “invariant structure”, because this is what defines the essence of the world.

I will develop the example of sound localization. When a source produces a sound, there are time-varying acoustical waves propagating in the air, and possibly reaching the ears of a listener. The input to the auditory system is two time-varying signals. Through the sensory flow, the identity and spatial location of the source are unchanged. Therefore, any piece of information about these two things must be found in properties of the auditory signals that are invariant through the sensory flow. For example, if we neglect sound diffraction, the fact that one signal is a delayed copy of the other, with a particular delay, is true as long as the sound exists. An invariant property of the acoustic signals is not necessary about the location of the sound source. It could be about the identity of the source for the example (the speaker). However, if that property is no longer invariant when movements are produced by the organism, then that property cannot be an intrinsic property of the source, but rather something about the relative location of the sound source.

In this framework, the computational problem of sound localization is in two stages: 1) for any single example, pick-up an acoustical invariant that is affected by head movements, 2) associate these acoustical invariants with sound location (either externally labeled, or defined with head movements). The second stage is essentially the computational problem defined in the neural coding/statistical learning framework. But the first stage is entirely different. It is about finding an invariant property within a single example, and this only makes sense if there is a sensory flow, i.e., if time is involved within a single example and not just across examples.

There is a great benefit in this approach, which is to solve part of the invariance problem from the beginning, before any category is assigned to an example. For example, a property about the binaural structure produced by a broadband sound source at a given position will also be true for another sound source at the same position. In this case, the invariance problem has disappeared entirely.

Within this new paradigm, the learning problem is now: given a set X of time-varying sensory signals produced by sound sources, how to find a mapping from X to some other space Y such that the images of sensory signals through this mapping do not vary over time, but vary across sources? Phrased in this way, this is essentially the goal of slow feature analysis. However the slow feature algorithm is a machine learning technique, whose biological instantiation is not straightforward.

There have been similar ideas in the field. In a highly cited paper, Peter Földiak proposed a very simple unsupervised Hebbian rule based on related considerations (Földiak, Neural Comp 1991). The study focused on the development of complex cells in the visual system, which respond to edges independently of their location. The complex cell combines inputs from simple cells, which respond to specific edges, and the neuron must learn the right combination. The invariance is learned by presenting moving edges, that is, it is looked for within the sensory flow and not across independent examples. The rule is very simple: it is a Hebbian rule in a rate-based model, where the instantaneous postsynaptic activity is replaced by a moving average. The idea is simply that, if the output must be temporally stable, then the presynaptic activity should be paired with the output at any time. Another paper by Schraudolph and Sejnowski (NIPS 1992) is actually about finding the “invariant structure” (with no mention of Gibson) using an anti-Hebbian rule, but this means that neurons signal the invariant structure by not firing, which is not what neurons in the MSO seems to be doing (although perhaps the idea might be applicable to the LSO).

There is a more recent paper in which slow feature analysis is formally related to Hebbian rules and to STDP (Sprekeler et al., PLoS CB 2007). Essentially, the argument is that minimizing the temporal variation of the output is equivalent to maximizing the variance of the low-pass filtered output. In other words, they provide a link between slow feature analysis and Földiak’s simple algorithm. There are also constraints, in particular the synaptic weights must be normalized. Intuitively this is obvious: to aim for a slowly varying input is the same thing as to aim for increasing the low frequency power of the signal. The angle in the paper is rather on rate models but it gives a simple rationale for designing learning rules that promote slowness. In fact, it appears that the selection of slow features follows from the combination of three homeostatic principles: maintaining a target mean potential and a target variance, and minimizing the temporal variation of the potential (through maximizing the variance of the low-pass filtered signal). The potential may be replaced by the calcium trace of spike trains, for example.

It is relatively straightforward to see how this might be applied for learning to decode the activity of ITD-sensitive neurons in the MSO into the location of the sound source. For example, a target neuron combines inputs from the MSO into the membrane potential, and the slowness principle is applied to either the membrane potential or the output spike train. As a result, we expect the membrane potential and the firing rate of this neuron to depend only on sound location. These neurons could be in the inferior colliculus, for example.

But can this principle be also applied to the MSO? In fact, the output of a single neuron in the MSO does not depend only on sound location, even for those neurons with a frequency-dependent best delay. Their output also depends on sound frequency, for example. But is it possible that their output is as slow as possible, given the constraints? It might be so, but another possibility is that only some property of the entire population is slow, and not the activity of individual neurons. For example, in the Jeffress model, only the identity of the maximally active neuron is invariant. But then we face a difficult question: what learning criterion should be applied at the level of an individual neuron so that there is a slow combination of the activities of all neurons?

I can imagine two principles. One is a backpropagation principle: the value of the criterion in the target neurons, i.e., slowness, is backpropagated to the MSO neurons and acts as a reward. The second is that the slowness criterion is applied at the cellular level not to the output of the cell, but to a signal representing a combined activity of neurons in the MSO, for example the activity of neighboring neurons.

Does free market theory support free markets?

In economy, free market theoreticians such as Milton Friedman have shown that under a number of assumptions, free markets are efficient. In particular, they do not have unemployment and resources are well distributed. This is based on conceptual arguments and a fair deal of mathematics, rather than on empirical evidence. The epistemology of economics is a bit peculiar, compared to other sciences. Indeed, economic theories have both an empirical value (you want to account for past and future economic observations) and a prescriptive value (you want to use theoretical principles to recommend economic policies). So in the face of contradicting evidence (there is unemployment in market economies), strong supporters of free market theory argue that, since the theory is valid, then it must be that real markets are not actually free, and so they should be freed of all types of regulations.

First of all, how scientists could get away with such argumentation is puzzling for anyone with an interest in epistemology. If the evidence supports the theory, then the theory is corroborated; if it doesn't, then the theory is also corroborated. This is precisely the kind of theory that Karl Popper called metaphysical: there is no way you can falsify it (like "there is a God").

But not all economists, and I would venture only a minority of economists (but perhaps not of politicians and financial executives), would argue on such a dogmatic line. Over the years, leading economists have identified a number of ways in which real markets do not and cannot comply with the assumptions of free market theory. For example, people are not rational in the sense of that theory (which postulates that you can predict all the consequences of your actions, at least in a probabilistic way), there is neither perfect nor symmetrical information between economic agents, competition is not always guaranteed, and there are externalities (consequences of individual decisions that impact agents not involved in the decision process).

All this is well known, at least in the economic field. However what most people do not realize, I believe, is that even at a conceptual level, free market theory does not actually support free markets.

One basic result of free market theory is that, if agents are only motivated by self-interest and there is complete information and fair competition, then profit should be very small in any transaction. Indeed if an economic agent were selling a product with a very large benefit, then soon enough another economic agent would sell the same product at a lower price and still make a sizeable benefit. Free marketeers usually stop here: great, in a free market economy, prices reflect the fair value of products. But let us not stop here and examine the consequences of this result. If agents are motivated by self-interest and the results of fair competition and complete information is to not make a profit, then agents are directly incited to create monopolies and hide or manipulate information, and they will avoid any situation in which they cannot do so. As a consequence, some agents make a large profit at the expense of global economic efficiency. The evidence for such behavior is everywhere, and mostly in legal ways. An obvious example of manipulating information is advertising, which is made by the same companies that make the products. The goal of advertising is precisely to have a biased influence on the decisions of customers. Another one would be selling credit to customers in contradiction with their own interests. Examples of monopoly seeking behavior are many: territorial intellectual property strategies (i.e. patenting so as to own a particular sector rather than for immediate exploitation) and patenting in general, monopolies in operating systems, and of course illegal agreements on prices in certain economic sectors. Creating monopolies is precisely the purpose of marketing, which is to differentiate the company's products from the rest: to present a product in such a specific way that the company is the only one to produce it. As a result, prices can go up because there is no competition, and no company has any interest in entering the competition since it would make the prices drop and generate no profit. The healthcare system in the US is another example: a system where prices are freely set by a market with captive customers fearing for their life, resulting in the most expensive system in the world by far and yet not at all the most efficient.

Free market theory demonstrates that in a free market economy, economic agents should adopt monopolistic and manipulative strategies that go against global economic efficiency. This is the part of free market theory that has empirical support.

What is sound? (IX) Sound localization and vision

In this post, I want to come back on a remark I made in a previous post, on the relationship between vision and spatial hearing. It appears that my account of the comparative study of Heffner and Heffner (Heffner & Heffner, 1992) was not accurate. Their findings are in fact even more interesting than I thought. They find that sound localization acuity across mammalian species is best predicted not by visual acuity, but by the width of the field of best vision.

Before I comment on this result, I need to explain a few details. Sound localization acuity was measured behaviorally in a left/right discrimination task near the midline, with broadband sounds. The authors report this discrimination threshold for 23 mammalian species, from gerbils to elephants. They then try to relate this value to various other quantities: the largest interaural time difference (ITD), which is directly related to head size, visual acuity (highest angular density of retinal cells), whether the animals are predatory or preys, and field of best vision. The latter quantity is defined as the angular width of the retina in which angular cell density is at least 75% of the highest density. So this quantity is directly related to the inhomogeneity of cell density in the retina.

The results of the comparative study are not straightforward (I find). Let us consider a few hypotheses. One hypothesis goes as follows. Sound localization acuity is directly related to the temporal precision of firing of auditory nerve fibers. If this precision is similar for all mammals, then this should correspond to a constant ITD threshold. In terms of angular threshold, sound localization acuity should then be inversely proportional to the largest ITD, and to head size. The same reasoning would go for intensity differences. Philosophically speaking, this corresponds to the classical information-processing view of perception: there is information about sound direction in the ITD, as reflected in the relative timing of spikes, and so sound direction can be estimated with a precision that is directly related to the temporal precision of neural firing. As I have argued many times in this blog, the flaw in the information-processing view is that information is defined with respect to an external reference (sound direction), which is accessible for an external observer. Nothing in the spikes themselves is about space: why would a difference in timing between two specific neurons produce a percept of space? It turns out that, of all the quantities the authors looked at, largest ITD is actually the worst predictor of sound localization acuity. Once the effect of best field of vision is removed, it is essentially uncorrelated (Fig. 8).

A second hypothesis goes as follows. The auditory system can estimate the ITD of sounds, but to interpret this ITD as the angle of the sound source requires calibration (learning), and this calibration requires vision. Therefore, sound localization acuity is directly determined by visual acuity. At first sight, this could be compatible with the information processing view of perception. However, the sound localization threshold is determined in a left/right localization task near the midline, and in fact this task does not require calibration. Indeed, one only needs to know which of the two ITDs is larger. Therefore, in the information-processing view, sound localization acuity should still be related to the temporal precision of neural “coding”. To make this hypothesis compatible with the information-processing view requires an additional evolutionary argument, which goes as follows. The sound localization system is optimized for a different task, absolute (not relative) localization, which requires calibration with vision. Therefore the temporal precision of neural firing, or of the binaural system, should match the required precision for that task. The authors find again that, once the effect of best field of vision is removed, visual acuity is essentially uncorrelated with sound localization acuity (Fig. 8).

Another evolutionary hypothesis could be that sound localization acuity is tuned for the particular needs of the animal. So a predator, like a cat, would need a very accurate sound localization system to be able to find a prey that is hiding. A prey would probably not require such high accuracy to be able to escape from a predator. An animal that is neither a prey nor a predator, like an elephant, would also not need high accuracy. It turns out that the elephant has one of the lowest localization thresholds in all mammals. Again there is no significant correlation once the best field of vision is factored out.

In this study, it appears rather clearly that the single quantity that best predicts sound localization acuity is the width of the best field of vision. First of all, this goes against the common view of the interaction between vision and hearing. According to this view, the visual system localizes the sound source, and this estimation is used to calibrate the sound localization system. If this were right, we would rather expect that localization acuity corresponds to visual acuity.

In terms of function, the results suggest that sound localization is used by animals to move their eyes so that the source is in the field of best vision. There are different ways to interpret this. The authors seem to follow the information-processing view, with the evolutionary twist: sound localization acuity reflects the precision of the auditory system, but that precision is adapted for the function of sound localization. One difficulty with this interpretation is that the auditory system is also involved in many other tasks that are unrelated to sound localization, such as sound identification. Therefore, only the precision of the sound localization system should be tuned to the difficulty of the task, for example the size of the medial superior olive, which is involved in the processing of ITDs. However, when thinking of intensity rather than timing differences, this view seems to imply that the precision of encoding of monaural intensities should be tuned to the difficulty of the binaural task.

Another difficulty comes from studies of vision-deprived or blind animals. There are a few of them, which tend to show that sound localization acuity actually tends to get better. This could not occur if sound localization acuity reflected genetic limitations. The interpretation can be saved by replacing evolution by development. That is, the sound localization system is tuned during development to reach a precision appropriate for the needs of the animal. For a sighted animal, these needs would be moving the eyes to the source, but for a blind animal it could be different.

An alternative interpretation that rejects the information-processing view is to consider that the meaning of binaural cues (ITD, ILD) can only come from what they imply for the animal, independently of the “encoding” precision. For a sighted animal, observing a given ITD would imply that moving the eyes or the head by a specific angle would put a moving object in the best field of view. If perceiving direction is perceiving the movement that must be performed to put the source in the best field of view, then sound localization acuity should correspond to the width of that field. For a blind animal, the connection with vision disappears, and so binaural cues must acquire a different meaning. This could be, for example, the movements required to reach the source. In this case, sound localization acuity could well be better than for a sighted animal.

In more operational terms, learning the association between binaural cues and movements (of the eyes or head) requires a feedback signal. In the calibration view, this feedback is the error between the predicted retinal location of the sound source and the actual location, given by the visual system. Here the feedback signal would rather be something like the total amount of motion in the visual field, or its correlation with sound, a quantity that would maximized when the source is in the best field of vision. This feedback is more like a reward than a teacher signal.

Finally, I suggest a simple experiment to test this hypothesis. Gerbils have a rather homogeneous retina, with a best field of vision of 200°. Accordingly, sound localization threshold is large, about 27°. The hypothesis would predict that, if gerbils were raised with an optical system (glasses) that creates an artificial fovea (enlarge a central part of the visual field), then their sound localization acuity should improve. Conversely, for an animal with a small field of best vision (cats), using an optical system that magnifies the visual field should degrade sound localization acuity. Finally, in humans with corrected vision, there should be a correlation between the type of correction and sound localization acuity.

This discussion also raises two points I will try to address later:

- If sound localization acuity reflects visual factors, then it should not depend on properties of the sound, as long as there are no constraints in the acoustics themselves (e.g. a pure tone may provide ambiguous cues).

- If sound localization is about moving the eyes or the head, then how about the feeling of distance, and other aspects of spatial hearing?

 

What does it mean to represent a relation?

In this blog, I have argued many times that if there are neural representations, these must be about relations. For example, a relation between two sensory signals, or about a potential action and the effect on the sensory signals. But what does it mean exactly that something (say neural activity) “represents” a relation? It turns out that the answer is not so obvious.

The classical way to understand it is to consider that a representation is something (an event, a number of spikes, etc) that stands for the thing being represented. That is, there is a mapping between the thing being represented and the thing that represents it. For example, in the Jeffress model of sound localization, the identity of the most active binaural neuron stands for the location of the sound, or in terms of relation, for the fact that the right acoustical signal is a delayed copy of the left acoustical signal, with a specific delay. The difficulty here is that a representation always involves three elements: 1) the thing to be represented, 2) the thing that represents it, 3) the mapping between the first two things. But in the classical representational view, we are left with only the second element. In what sense does the firing of a binaural neuron tells us that there is such a specific relation between the monaural signals? Well it doesn’t, unless we already know in advance that this is what the firing of that neuron stands for. But from observing the firing of the binaural neurons, there is no way we can ever know that: we just see neurons lighting up sometimes.

There are different ways to address this issue. The simplest one is simply to say: it doesn’t matter. The activity of the binaural neurons represents a relationship between the monaural neurons, at least for us external observers, but the organism doesn’t care: what matters is that their activity can be related to the location of the sound source, defined for example as the movement of the eyes that put the sound source in the fovea. In operational terms, the organism must be able to take an action conditionally to the validity of a given relation, but what this relation exactly is in terms of the acoustical signals doesn’t matter.

An important remark is in order here. There is a difference between representing a relation and representing a quantity (or vector), even in this simple notion of representation. A relation is a statement that may be true or not. This is different from a quantity resulting from an operation. For example, one may always calculate the peak lag in the cross-correlation function between two acoustical signals, and call this “the ITD” (interaural time difference). But such a number is obtained whether there is a source, several sources or no source at all. Thus, this is not the same as relations of the form: the right signal equals the left signal delayed by 500 µs. Therefore, we are not speaking of a mapping between acoustical signals and action, which would be unconditional, but of actions conditional to a relation in the acoustical signals.

Now there is another way to understand the phrase “representing a relation”, which is in a predictive way: if there is a relation between A and B, then representing the relation means that from A and the relation, it is possible to predict B. For example: saying that the right signal is a delayed copy of the left signal, with delay 500 µs, means that if I know that the relation is true and I have the left signal, then I can predict the right signal. In the Jeffress model, or in fact in any model that represents the relation in the previous sense, it is possible to infer the right signal from the left signal and the representation, but only if the meaning of that representation is known, i.e., if it is known that a given neuron firing stands for “B comes 500 µs after A”. This is an important distinction with the previous notion of representation, where the meaning of the relation in terms of acoustics was irrelevant.

We now have a substantial problem: where does the meaning of the representation come from? The firing of binaural neurons in itself does not tell us anything about how to reconstruct signals. To see the problem more clearly, imagine that the binaural neurons develop by selecting axons from both sides. In the end there is a set of binaural neurons whose firing stands for binaural relations with different ITDs. But by just looking at the activity of the binaural neurons after development, or at the activity of both the binaural neurons and the left monaural acoustical signal, it is impossible to know what the ITD is, or what the right acoustical signal is at any time. To be able to do this, one actually needs to have learned the meaning of the representation carried by the binaural neurons, and this learning seems to require both monaural inputs.

It now seems that this second notion of representation is not very useful, since in any case it requires all terms of the relation. This brings us to the notion that to represent a relation, and not just a specific instantiation of it (i.e., these particular signals have such property), it must be represented in a sense that may apply to any instantiation. For example, if I know that a source at a given location, I can imagine for any left signal what should be the right signal. Or, given a signal on the left, I can imagine what should be the right signal if the source were at a given location.

I’m ending this post with probably more confusion than when I started. This is partly intended. I want to stress here that once we start thinking of perceptual representations in terms of relations, then classical notions of neural representations quickly seem problematic or at least insufficient.

Information about what?

In a previous post, I pointed out that the word “information” is almost always used in neuroscience in the sense of information theory, and this is a very restricted notion of information that leads to dualism in many situations. There is another way to look at this issue, which is to ask the question: information about what?

In discussions of neural information or “codes”, there are always three elements involved: 1) what carries the information, e.g. neural electrical activity, 2) what the information is about (e.g. the orientation of a bar), 3) the correspondence between the first two elements. If dualism is rejected, then all the information an organism ever gets from the world must come from its own senses (and the effect of actions on them). Therefore, if one speaks of information for the organism, as opposed to information for an external observer, then the key point to consider is that the information should be about something intrinsic to the organism. For example, it should not be about an abstract parameter of an experimental protocol.

So what kind of information are we left with? For example, there may be information in one sensory signal about another sensory signal, in the sense that one can be (partially) predicted from the other. Or there can be information in a sensory signal about the future signal. This is equivalent to saying that the signals follow some law, a theme developed for example by Gibson (invariant structure) and O’Regan (sensorimotor contingency).

One might think that this conception of information would imply that we can’t know much about the world. But this is not true at all, because there is knowledge about the world coming from the interaction of the organism with the world. Consider space for example. A century ago, Poincaré noted that space, with its topology and structure, can be entirely defined by the effect of our own movements on our senses. To simplify, assume that the head and eyes are fixed in our body and we can move only by translations, although with possibly complex movements. We can go from point A to point B through some movements. Points A and B differ by the visual inputs. Movements act on the set of points (=visual inputs) as a group action that has the structure of a two-dimensional Euclidian space (for example, for each movement there is an opposite movement that takes you back to the previous point, combinations of movements are commutative, etc). This defines space as a two-dimensional affine space. In fact, Poincaré (and also Einstein) went further and noted that space as we know it is necessarily defined with respect to an observer. For example, we can define absolute Cartesian coordinates for space, but the reference point is arbitrary, only relative statements are actually meaningful.

In summary, it is not so much that the concept of information or code is completely irrelevant in itself. The issues arise when one speaks of codes about something external to the organism. In the end, this is nothing else than a modern version of dualism (as Dennett pointed out with his “Cartesian theater”). Rejecting dualism implies that any information relevant to an organism must be about something that the organism can do or observe, not about what an external observer can define.