What is sound? (X) What is loudness?

At first sight, it seems obvious what loudness is. A sound is loud when the acoustical wave carries a lot of energy. But if we think about it in details, we quickly encounter difficulties. One obvious thing is that if we play the same sound at different levels, then clearly the feeling of loudness directly correlates with the amplitude of the sound, and therefore with the energy of the sound. But how about if we play two completely different sounds? Which one is louder? Should we consider the total energy? Probably not, because this would introduce a confusion with duration (the longer sound has more energy). So perhaps the average energy? But then what is the average energy of an impact sound, and how does it compare with a tone? Also, how about low sounds and high sounds, is there the same relationship between energy and loudness for both sounds? And does a sound feel as loud in a quiet environment as in a noisy environment? Does it depend on what sounds were played before?

I could go on indefinitely, but I have made the point that loudness is a complex concept, and its relationship with the acoustic signal is not straightforward at all.

Let us see what can be said about loudness. First of all, we can say that a sound is louder than another sound, even if the two sounds are completely different. This may not be true of all pairs of sounds, but certainly I can consider that a low amplitude tone is weak compared to the sound made by a glass breaking on the floor. So certainly there seems to be an order relationship in loudness, although perhaps partial. Also, it is true that scaling the acoustical wave has the effect of monotonically changing the loudness of the sound. So there is definitely a relationship with the amplitude, but only in that scaling sense: it is not determined by simple physical quantities such as the peak pressure or the total energy.

Now it is interesting to think for a while about the notion of a sound being “not loud enough” and of a sound being “too loud”, because it appears that these two phrases do not refer to the same concept. We say that a sound is “not loud enough” when we find it hard to hear, when it is difficult to make sense of it. For example we ask someone to speak louder. Thus this notion of loudness corresponds to intelligibility, rather than acoustical energy. In particular, this is a relative notion, in the sense that intelligibility depends on the acoustical environment – background noise, other sources, reverberation, etc.

But saying that a sound is “too loud” refers to a completely different concept. It means that the sound produces an uncomfortable feeling because of its intensity. This is unrelated to intelligibility: someone screaming may produce a sound that is “too loud”, but two people screaming would also produce a sound that is “too loud”, even though intelligibility decreases. Therefore, there are at least two different notions regarding loudness: a relative notion related to intelligibility, and a more absolute one related to an unpleasant or even painful feeling. Note that it can also be said that a sound is too loud in the sense of intelligibility. For example, it can be said that the TV is too loud because it makes it hard to understand someone speaking to us. So the notion of loudness is multiform, and therefore cannot be mapped to a single scale.

Loudness as in “not loud enough” (intelligibility) is rather simple to understand. If the signal-to-noise ratio is too low, then it is more difficult to extract the relevant information from the signal, and this is what is meant by “not loud enough”. Of course there are subtleties and the relationship between the acoustical signals and intelligibility is complex, but at least it is relatively clear what it is about. In contrast, it is not so straightforward what “too loud” means. Why would a sound be unpleasant because the acoustical pressure is large?

First of all, what does it mean that something is unpleasant or painful? Something unpleasant is something that we want to avoid. But this is not a complete characterization: it is not only a piece of information that is taken into account in decision making; it has the character of an uncontrollable feeling, something that we cannot avoid being subjected to. In other words, it is an emotion. Being controlled by this emotion means acting so as to escape the unpleasant sound, for example, by putting one’s hands on the ears. Consciously trying not to act in such a way would be considered as “resisting” this emotion. This terminology implies that loudness (as in “too loud”) is an involuntary avoidance reaction of the organism to sounds, one that implies attenuating the sounds. Therefore, loudness is not only about objective properties of the external world, but also about our biological self, or more precisely about the effect of sounds on our organism.

Why would a loud sound trigger an avoidance reaction? We can speculate on different possibilities.

1) A loud sound may indicate a threat. There is indeed a known reflex called “startle reflex”, with a latency of around 10 ms (Yeomans and Frankland, Brain Research Reviews 1996). In response to sudden unexpected loud sounds, there is an involuntary contraction of muscles, which stiffens in particular the neck during a brief period of time. The reflex is found in all mammals and involves a short pathway in the brainstem. It is also affected by previous sounds and emotional state. However, this reflex only involves a small subset of sounds, which are sudden and normally very loud (over 80 dB).

2) A very loud sound can damage the cochlea (destroy hair cells). At very high levels, it can even be painful. Note that a moderately loud sound can also damage the cochlea if it lasts long. Thus, the feeling of loudness could be related to the emotional reaction aimed at avoiding damage to the cochlea. Note that while cochlear damage depends on duration, loudness does not. That is, a continuous pure tone seems just as loud at the beginning as 1 minute into it, and yet because damage depends on continuous exposition, an avoidance reaction should be more urgent in the latter case than in the former case. Even for very loud sounds, the feeling of loudness does not seem to increase with time: it may seem more and more urgent to avoid the sound, but it does not feel louder. We can draw two conclusions: 1) the feeling of loudness, or of a sound being too loud, cannot correspond to an accurate biological measurement of potential cochlear damage, as it seems to have a feeling of constancy when the sound is stationary; 2) the feeling of a sound being “too loud” probably doesn’t correspond to the urgency of avoiding that sound, since this urgency can increase (emotionally) without a corresponding increase in loudness. It could be that the emotional content (“too loud”) comes in addition to the perceptual content (a certain degree of loudness), and that only the latter is constant for a stationary sound.

3) Another possibility is that loudness correlates with the energy consumption of the auditory periphery (possibly of the auditory system in general). Indeed when the amplitude of an acoustical wave is increased, the auditory nerve fibers and most neurons in the auditory system fire more. Brain metabolism is tightly regulated, and so it is not at all absurd to postulate that there are mechanisms to sense the energy consumption due to a sound. However, this is not a very satisfying explanation of why a sound would feel “too loud”. Indeed why would the organism feel an urge to avoid a sound because it incurs a large energy consumption, when there could be mechanisms to reduce that consumption?

In this post, I have addressed two aspects of loudness: intelligibility (“not loud enough”) and emotional content (“too loud”). These two aspects are “proximal”, in the sense that they are determined not so much by the sound source as by the acoustical wave at the ear. In the next post, I will consider distal aspects of loudness, that is, those aspects of loudness that are determined by the sound source.

Neural coding and the invariance problem

In sensory systems, one of the hardest computational problems is the “invariance problem”: the same perceptual category can be associated with a large diversity of sensory signals. A classical example is the problem of a recognizing a face: the same face can appear with different orientations relative to the observer, and under different lighting conditions, and it is a challenge to design a recognition system that is invariant to these sources of variation.

In computational neuroscience, the problem is usually framed within the paradigm of statistical learning theory as follows. Perceptual categories belong to some set Y (the set of faces). Sensory signals belong to some high-dimensional sensory space X (e.g. pixels). Each particular category (a particular face) corresponds to a specific set of signals in X (different views of the face) or to a distribution on X. The goal is to find the correct mapping from X to Y from particular labeled examples (a particular view x of a face, the name y corresponding to that face). This is also the view that underlies the “neural coding” paradigm, where there is a communication channel between Y and X, and X contains “information” about Y.

Framed in this way, this is a really difficult problem in general, and it requires many examples to form categories. However, there is a different way of approaching the problem, which follows from the concept of “invariant structure” developed by James Gibson. It starts with the observation that a sensory system does not receive a static input (an image) but rather a sensory flow. This is obvious in hearing (sounds are carried by acoustic waves, which vary in time), but it is also true of vision: the eyes are constantly moving even when fixating an object (e.g. high-frequency tremors). A perceptual system is looking for things that do not vary within this sensory flow, the “invariant structure”, because this is what defines the essence of the world.

I will develop the example of sound localization. When a source produces a sound, there are time-varying acoustical waves propagating in the air, and possibly reaching the ears of a listener. The input to the auditory system is two time-varying signals. Through the sensory flow, the identity and spatial location of the source are unchanged. Therefore, any piece of information about these two things must be found in properties of the auditory signals that are invariant through the sensory flow. For example, if we neglect sound diffraction, the fact that one signal is a delayed copy of the other, with a particular delay, is true as long as the sound exists. An invariant property of the acoustic signals is not necessary about the location of the sound source. It could be about the identity of the source for the example (the speaker). However, if that property is no longer invariant when movements are produced by the organism, then that property cannot be an intrinsic property of the source, but rather something about the relative location of the sound source.

In this framework, the computational problem of sound localization is in two stages: 1) for any single example, pick-up an acoustical invariant that is affected by head movements, 2) associate these acoustical invariants with sound location (either externally labeled, or defined with head movements). The second stage is essentially the computational problem defined in the neural coding/statistical learning framework. But the first stage is entirely different. It is about finding an invariant property within a single example, and this only makes sense if there is a sensory flow, i.e., if time is involved within a single example and not just across examples.

There is a great benefit in this approach, which is to solve part of the invariance problem from the beginning, before any category is assigned to an example. For example, a property about the binaural structure produced by a broadband sound source at a given position will also be true for another sound source at the same position. In this case, the invariance problem has disappeared entirely.

Within this new paradigm, the learning problem is now: given a set X of time-varying sensory signals produced by sound sources, how to find a mapping from X to some other space Y such that the images of sensory signals through this mapping do not vary over time, but vary across sources? Phrased in this way, this is essentially the goal of slow feature analysis. However the slow feature algorithm is a machine learning technique, whose biological instantiation is not straightforward.

There have been similar ideas in the field. In a highly cited paper, Peter Földiak proposed a very simple unsupervised Hebbian rule based on related considerations (Földiak, Neural Comp 1991). The study focused on the development of complex cells in the visual system, which respond to edges independently of their location. The complex cell combines inputs from simple cells, which respond to specific edges, and the neuron must learn the right combination. The invariance is learned by presenting moving edges, that is, it is looked for within the sensory flow and not across independent examples. The rule is very simple: it is a Hebbian rule in a rate-based model, where the instantaneous postsynaptic activity is replaced by a moving average. The idea is simply that, if the output must be temporally stable, then the presynaptic activity should be paired with the output at any time. Another paper by Schraudolph and Sejnowski (NIPS 1992) is actually about finding the “invariant structure” (with no mention of Gibson) using an anti-Hebbian rule, but this means that neurons signal the invariant structure by not firing, which is not what neurons in the MSO seems to be doing (although perhaps the idea might be applicable to the LSO).

There is a more recent paper in which slow feature analysis is formally related to Hebbian rules and to STDP (Sprekeler et al., PLoS CB 2007). Essentially, the argument is that minimizing the temporal variation of the output is equivalent to maximizing the variance of the low-pass filtered output. In other words, they provide a link between slow feature analysis and Földiak’s simple algorithm. There are also constraints, in particular the synaptic weights must be normalized. Intuitively this is obvious: to aim for a slowly varying input is the same thing as to aim for increasing the low frequency power of the signal. The angle in the paper is rather on rate models but it gives a simple rationale for designing learning rules that promote slowness. In fact, it appears that the selection of slow features follows from the combination of three homeostatic principles: maintaining a target mean potential and a target variance, and minimizing the temporal variation of the potential (through maximizing the variance of the low-pass filtered signal). The potential may be replaced by the calcium trace of spike trains, for example.

It is relatively straightforward to see how this might be applied for learning to decode the activity of ITD-sensitive neurons in the MSO into the location of the sound source. For example, a target neuron combines inputs from the MSO into the membrane potential, and the slowness principle is applied to either the membrane potential or the output spike train. As a result, we expect the membrane potential and the firing rate of this neuron to depend only on sound location. These neurons could be in the inferior colliculus, for example.

But can this principle be also applied to the MSO? In fact, the output of a single neuron in the MSO does not depend only on sound location, even for those neurons with a frequency-dependent best delay. Their output also depends on sound frequency, for example. But is it possible that their output is as slow as possible, given the constraints? It might be so, but another possibility is that only some property of the entire population is slow, and not the activity of individual neurons. For example, in the Jeffress model, only the identity of the maximally active neuron is invariant. But then we face a difficult question: what learning criterion should be applied at the level of an individual neuron so that there is a slow combination of the activities of all neurons?

I can imagine two principles. One is a backpropagation principle: the value of the criterion in the target neurons, i.e., slowness, is backpropagated to the MSO neurons and acts as a reward. The second is that the slowness criterion is applied at the cellular level not to the output of the cell, but to a signal representing a combined activity of neurons in the MSO, for example the activity of neighboring neurons.

Does free market theory support free markets?

In economy, free market theoreticians such as Milton Friedman have shown that under a number of assumptions, free markets are efficient. In particular, they do not have unemployment and resources are well distributed. This is based on conceptual arguments and a fair deal of mathematics, rather than on empirical evidence. The epistemology of economics is a bit peculiar, compared to other sciences. Indeed, economic theories have both an empirical value (you want to account for past and future economic observations) and a prescriptive value (you want to use theoretical principles to recommend economic policies). So in the face of contradicting evidence (there is unemployment in market economies), strong supporters of free market theory argue that, since the theory is valid, then it must be that real markets are not actually free, and so they should be freed of all types of regulations.

First of all, how scientists could get away with such argumentation is puzzling for anyone with an interest in epistemology. If the evidence supports the theory, then the theory is corroborated; if it doesn't, then the theory is also corroborated. This is precisely the kind of theory that Karl Popper called metaphysical: there is no way you can falsify it (like "there is a God").

But not all economists, and I would venture only a minority of economists (but perhaps not of politicians and financial executives), would argue on such a dogmatic line. Over the years, leading economists have identified a number of ways in which real markets do not and cannot comply with the assumptions of free market theory. For example, people are not rational in the sense of that theory (which postulates that you can predict all the consequences of your actions, at least in a probabilistic way), there is neither perfect nor symmetrical information between economic agents, competition is not always guaranteed, and there are externalities (consequences of individual decisions that impact agents not involved in the decision process).

All this is well known, at least in the economic field. However what most people do not realize, I believe, is that even at a conceptual level, free market theory does not actually support free markets.

One basic result of free market theory is that, if agents are only motivated by self-interest and there is complete information and fair competition, then profit should be very small in any transaction. Indeed if an economic agent were selling a product with a very large benefit, then soon enough another economic agent would sell the same product at a lower price and still make a sizeable benefit. Free marketeers usually stop here: great, in a free market economy, prices reflect the fair value of products. But let us not stop here and examine the consequences of this result. If agents are motivated by self-interest and the results of fair competition and complete information is to not make a profit, then agents are directly incited to create monopolies and hide or manipulate information, and they will avoid any situation in which they cannot do so. As a consequence, some agents make a large profit at the expense of global economic efficiency. The evidence for such behavior is everywhere, and mostly in legal ways. An obvious example of manipulating information is advertising, which is made by the same companies that make the products. The goal of advertising is precisely to have a biased influence on the decisions of customers. Another one would be selling credit to customers in contradiction with their own interests. Examples of monopoly seeking behavior are many: territorial intellectual property strategies (i.e. patenting so as to own a particular sector rather than for immediate exploitation) and patenting in general, monopolies in operating systems, and of course illegal agreements on prices in certain economic sectors. Creating monopolies is precisely the purpose of marketing, which is to differentiate the company's products from the rest: to present a product in such a specific way that the company is the only one to produce it. As a result, prices can go up because there is no competition, and no company has any interest in entering the competition since it would make the prices drop and generate no profit. The healthcare system in the US is another example: a system where prices are freely set by a market with captive customers fearing for their life, resulting in the most expensive system in the world by far and yet not at all the most efficient.

Free market theory demonstrates that in a free market economy, economic agents should adopt monopolistic and manipulative strategies that go against global economic efficiency. This is the part of free market theory that has empirical support.

What is sound? (IX) Sound localization and vision

In this post, I want to come back on a remark I made in a previous post, on the relationship between vision and spatial hearing. It appears that my account of the comparative study of Heffner and Heffner (Heffner & Heffner, 1992) was not accurate. Their findings are in fact even more interesting than I thought. They find that sound localization acuity across mammalian species is best predicted not by visual acuity, but by the width of the field of best vision.

Before I comment on this result, I need to explain a few details. Sound localization acuity was measured behaviorally in a left/right discrimination task near the midline, with broadband sounds. The authors report this discrimination threshold for 23 mammalian species, from gerbils to elephants. They then try to relate this value to various other quantities: the largest interaural time difference (ITD), which is directly related to head size, visual acuity (highest angular density of retinal cells), whether the animals are predatory or preys, and field of best vision. The latter quantity is defined as the angular width of the retina in which angular cell density is at least 75% of the highest density. So this quantity is directly related to the inhomogeneity of cell density in the retina.

The results of the comparative study are not straightforward (I find). Let us consider a few hypotheses. One hypothesis goes as follows. Sound localization acuity is directly related to the temporal precision of firing of auditory nerve fibers. If this precision is similar for all mammals, then this should correspond to a constant ITD threshold. In terms of angular threshold, sound localization acuity should then be inversely proportional to the largest ITD, and to head size. The same reasoning would go for intensity differences. Philosophically speaking, this corresponds to the classical information-processing view of perception: there is information about sound direction in the ITD, as reflected in the relative timing of spikes, and so sound direction can be estimated with a precision that is directly related to the temporal precision of neural firing. As I have argued many times in this blog, the flaw in the information-processing view is that information is defined with respect to an external reference (sound direction), which is accessible for an external observer. Nothing in the spikes themselves is about space: why would a difference in timing between two specific neurons produce a percept of space? It turns out that, of all the quantities the authors looked at, largest ITD is actually the worst predictor of sound localization acuity. Once the effect of best field of vision is removed, it is essentially uncorrelated (Fig. 8).

A second hypothesis goes as follows. The auditory system can estimate the ITD of sounds, but to interpret this ITD as the angle of the sound source requires calibration (learning), and this calibration requires vision. Therefore, sound localization acuity is directly determined by visual acuity. At first sight, this could be compatible with the information processing view of perception. However, the sound localization threshold is determined in a left/right localization task near the midline, and in fact this task does not require calibration. Indeed, one only needs to know which of the two ITDs is larger. Therefore, in the information-processing view, sound localization acuity should still be related to the temporal precision of neural “coding”. To make this hypothesis compatible with the information-processing view requires an additional evolutionary argument, which goes as follows. The sound localization system is optimized for a different task, absolute (not relative) localization, which requires calibration with vision. Therefore the temporal precision of neural firing, or of the binaural system, should match the required precision for that task. The authors find again that, once the effect of best field of vision is removed, visual acuity is essentially uncorrelated with sound localization acuity (Fig. 8).

Another evolutionary hypothesis could be that sound localization acuity is tuned for the particular needs of the animal. So a predator, like a cat, would need a very accurate sound localization system to be able to find a prey that is hiding. A prey would probably not require such high accuracy to be able to escape from a predator. An animal that is neither a prey nor a predator, like an elephant, would also not need high accuracy. It turns out that the elephant has one of the lowest localization thresholds in all mammals. Again there is no significant correlation once the best field of vision is factored out.

In this study, it appears rather clearly that the single quantity that best predicts sound localization acuity is the width of the best field of vision. First of all, this goes against the common view of the interaction between vision and hearing. According to this view, the visual system localizes the sound source, and this estimation is used to calibrate the sound localization system. If this were right, we would rather expect that localization acuity corresponds to visual acuity.

In terms of function, the results suggest that sound localization is used by animals to move their eyes so that the source is in the field of best vision. There are different ways to interpret this. The authors seem to follow the information-processing view, with the evolutionary twist: sound localization acuity reflects the precision of the auditory system, but that precision is adapted for the function of sound localization. One difficulty with this interpretation is that the auditory system is also involved in many other tasks that are unrelated to sound localization, such as sound identification. Therefore, only the precision of the sound localization system should be tuned to the difficulty of the task, for example the size of the medial superior olive, which is involved in the processing of ITDs. However, when thinking of intensity rather than timing differences, this view seems to imply that the precision of encoding of monaural intensities should be tuned to the difficulty of the binaural task.

Another difficulty comes from studies of vision-deprived or blind animals. There are a few of them, which tend to show that sound localization acuity actually tends to get better. This could not occur if sound localization acuity reflected genetic limitations. The interpretation can be saved by replacing evolution by development. That is, the sound localization system is tuned during development to reach a precision appropriate for the needs of the animal. For a sighted animal, these needs would be moving the eyes to the source, but for a blind animal it could be different.

An alternative interpretation that rejects the information-processing view is to consider that the meaning of binaural cues (ITD, ILD) can only come from what they imply for the animal, independently of the “encoding” precision. For a sighted animal, observing a given ITD would imply that moving the eyes or the head by a specific angle would put a moving object in the best field of view. If perceiving direction is perceiving the movement that must be performed to put the source in the best field of view, then sound localization acuity should correspond to the width of that field. For a blind animal, the connection with vision disappears, and so binaural cues must acquire a different meaning. This could be, for example, the movements required to reach the source. In this case, sound localization acuity could well be better than for a sighted animal.

In more operational terms, learning the association between binaural cues and movements (of the eyes or head) requires a feedback signal. In the calibration view, this feedback is the error between the predicted retinal location of the sound source and the actual location, given by the visual system. Here the feedback signal would rather be something like the total amount of motion in the visual field, or its correlation with sound, a quantity that would maximized when the source is in the best field of vision. This feedback is more like a reward than a teacher signal.

Finally, I suggest a simple experiment to test this hypothesis. Gerbils have a rather homogeneous retina, with a best field of vision of 200°. Accordingly, sound localization threshold is large, about 27°. The hypothesis would predict that, if gerbils were raised with an optical system (glasses) that creates an artificial fovea (enlarge a central part of the visual field), then their sound localization acuity should improve. Conversely, for an animal with a small field of best vision (cats), using an optical system that magnifies the visual field should degrade sound localization acuity. Finally, in humans with corrected vision, there should be a correlation between the type of correction and sound localization acuity.

This discussion also raises two points I will try to address later:

- If sound localization acuity reflects visual factors, then it should not depend on properties of the sound, as long as there are no constraints in the acoustics themselves (e.g. a pure tone may provide ambiguous cues).

- If sound localization is about moving the eyes or the head, then how about the feeling of distance, and other aspects of spatial hearing?

 

What does it mean to represent a relation?

In this blog, I have argued many times that if there are neural representations, these must be about relations. For example, a relation between two sensory signals, or about a potential action and the effect on the sensory signals. But what does it mean exactly that something (say neural activity) “represents” a relation? It turns out that the answer is not so obvious.

The classical way to understand it is to consider that a representation is something (an event, a number of spikes, etc) that stands for the thing being represented. That is, there is a mapping between the thing being represented and the thing that represents it. For example, in the Jeffress model of sound localization, the identity of the most active binaural neuron stands for the location of the sound, or in terms of relation, for the fact that the right acoustical signal is a delayed copy of the left acoustical signal, with a specific delay. The difficulty here is that a representation always involves three elements: 1) the thing to be represented, 2) the thing that represents it, 3) the mapping between the first two things. But in the classical representational view, we are left with only the second element. In what sense does the firing of a binaural neuron tells us that there is such a specific relation between the monaural signals? Well it doesn’t, unless we already know in advance that this is what the firing of that neuron stands for. But from observing the firing of the binaural neurons, there is no way we can ever know that: we just see neurons lighting up sometimes.

There are different ways to address this issue. The simplest one is simply to say: it doesn’t matter. The activity of the binaural neurons represents a relationship between the monaural neurons, at least for us external observers, but the organism doesn’t care: what matters is that their activity can be related to the location of the sound source, defined for example as the movement of the eyes that put the sound source in the fovea. In operational terms, the organism must be able to take an action conditionally to the validity of a given relation, but what this relation exactly is in terms of the acoustical signals doesn’t matter.

An important remark is in order here. There is a difference between representing a relation and representing a quantity (or vector), even in this simple notion of representation. A relation is a statement that may be true or not. This is different from a quantity resulting from an operation. For example, one may always calculate the peak lag in the cross-correlation function between two acoustical signals, and call this “the ITD” (interaural time difference). But such a number is obtained whether there is a source, several sources or no source at all. Thus, this is not the same as relations of the form: the right signal equals the left signal delayed by 500 µs. Therefore, we are not speaking of a mapping between acoustical signals and action, which would be unconditional, but of actions conditional to a relation in the acoustical signals.

Now there is another way to understand the phrase “representing a relation”, which is in a predictive way: if there is a relation between A and B, then representing the relation means that from A and the relation, it is possible to predict B. For example: saying that the right signal is a delayed copy of the left signal, with delay 500 µs, means that if I know that the relation is true and I have the left signal, then I can predict the right signal. In the Jeffress model, or in fact in any model that represents the relation in the previous sense, it is possible to infer the right signal from the left signal and the representation, but only if the meaning of that representation is known, i.e., if it is known that a given neuron firing stands for “B comes 500 µs after A”. This is an important distinction with the previous notion of representation, where the meaning of the relation in terms of acoustics was irrelevant.

We now have a substantial problem: where does the meaning of the representation come from? The firing of binaural neurons in itself does not tell us anything about how to reconstruct signals. To see the problem more clearly, imagine that the binaural neurons develop by selecting axons from both sides. In the end there is a set of binaural neurons whose firing stands for binaural relations with different ITDs. But by just looking at the activity of the binaural neurons after development, or at the activity of both the binaural neurons and the left monaural acoustical signal, it is impossible to know what the ITD is, or what the right acoustical signal is at any time. To be able to do this, one actually needs to have learned the meaning of the representation carried by the binaural neurons, and this learning seems to require both monaural inputs.

It now seems that this second notion of representation is not very useful, since in any case it requires all terms of the relation. This brings us to the notion that to represent a relation, and not just a specific instantiation of it (i.e., these particular signals have such property), it must be represented in a sense that may apply to any instantiation. For example, if I know that a source at a given location, I can imagine for any left signal what should be the right signal. Or, given a signal on the left, I can imagine what should be the right signal if the source were at a given location.

I’m ending this post with probably more confusion than when I started. This is partly intended. I want to stress here that once we start thinking of perceptual representations in terms of relations, then classical notions of neural representations quickly seem problematic or at least insufficient.

Information about what?

In a previous post, I pointed out that the word “information” is almost always used in neuroscience in the sense of information theory, and this is a very restricted notion of information that leads to dualism in many situations. There is another way to look at this issue, which is to ask the question: information about what?

In discussions of neural information or “codes”, there are always three elements involved: 1) what carries the information, e.g. neural electrical activity, 2) what the information is about (e.g. the orientation of a bar), 3) the correspondence between the first two elements. If dualism is rejected, then all the information an organism ever gets from the world must come from its own senses (and the effect of actions on them). Therefore, if one speaks of information for the organism, as opposed to information for an external observer, then the key point to consider is that the information should be about something intrinsic to the organism. For example, it should not be about an abstract parameter of an experimental protocol.

So what kind of information are we left with? For example, there may be information in one sensory signal about another sensory signal, in the sense that one can be (partially) predicted from the other. Or there can be information in a sensory signal about the future signal. This is equivalent to saying that the signals follow some law, a theme developed for example by Gibson (invariant structure) and O’Regan (sensorimotor contingency).

One might think that this conception of information would imply that we can’t know much about the world. But this is not true at all, because there is knowledge about the world coming from the interaction of the organism with the world. Consider space for example. A century ago, Poincaré noted that space, with its topology and structure, can be entirely defined by the effect of our own movements on our senses. To simplify, assume that the head and eyes are fixed in our body and we can move only by translations, although with possibly complex movements. We can go from point A to point B through some movements. Points A and B differ by the visual inputs. Movements act on the set of points (=visual inputs) as a group action that has the structure of a two-dimensional Euclidian space (for example, for each movement there is an opposite movement that takes you back to the previous point, combinations of movements are commutative, etc). This defines space as a two-dimensional affine space. In fact, Poincaré (and also Einstein) went further and noted that space as we know it is necessarily defined with respect to an observer. For example, we can define absolute Cartesian coordinates for space, but the reference point is arbitrary, only relative statements are actually meaningful.

In summary, it is not so much that the concept of information or code is completely irrelevant in itself. The issues arise when one speaks of codes about something external to the organism. In the end, this is nothing else than a modern version of dualism (as Dennett pointed out with his “Cartesian theater”). Rejecting dualism implies that any information relevant to an organism must be about something that the organism can do or observe, not about what an external observer can define.

Rate vs. timing (XX) Flavors of spike-based theories (6) Predictive coding and spike-based inference

In two companion papers in Neural Computation (followed by a related paper on working memory), Sophie Denève developed a spike-based theory of Bayesian inference. It can be categorized as a representational spike-based theory, in the sense that spikes collectively represent some objective variable of the world, for which there is some uncertainty. It follows a typical Marr-ian approach, in which the function of the neural network (level 1) is first postulated, in terms of external properties of the world, and then the properties of the network (dynamics, connectivity) are derived. But unlike Marr’s approach, the algorithmic and physical levels are not considered independent, that is, the algorithm is defined directly at the level of spikes. In the first paper, it is assumed that a neuron codes for a hidden variable, corresponding to an external property of the world. The neuron must infer that variable from a set of observations, which are independent Poisson inputs whose rates depend on the binary value. The neuron codes for other neurons (as opposed to the external observer), that is, it is postulated that the log odds of the hidden variable are estimated from the spike train produced by the train as a sum of PSPs in a target neuron. Thus, the decoding process is fixed, and the dynamics of the neuron can then be deduced from an optimization principle, that is, so that the decoded quantity is as close as possible to the true quantity.

One can write a differential-spike equation that describes how the log-odds evolve with the inputs. At any time, the estimated log-odds can be estimated by the neuron from its output spike train. A spike is then produced if it brings the estimation closer to the true value of the log-odds, calculated from the inputs.

In this proposition, the spiking process is deterministic and spiking is seen as a timed decision. This is completely different from rate-based theory, in which spikes are random and instantiate a time-varying quantity. Although this is a rather abstract sensory scenario, the idea that spikes could be corrective signals is powerful. It connects to the point I made in the previous post, that the key point in spike-based theories is not temporal precision or reproducibility, as is sometimes wrongly claimed, it is the fact that spikes between different neurons are coordinated. When this theory is extended to a population of neurons, an immediate consequence is that spiking decisions are dependent on the decisions made by other neurons, and it follows that although spiking is deterministic at the cellular level and precise at the functional level (estimation of the hidden variable), it may not be reproducible between trials. In fact, even the relative timing of neurons may not be reproducible – for exactly the same reason as in sparse coding theory.

What is computational neuroscience? (X) Reverse engineering the brain

One phrase that occasionally pops up when speaking of the goal of computational neuroscience is “reverse engineering the brain”. This is quite an interesting phrase from an epistemological point of view. The analogy is to see the brain as an engineered device, the “engineer” being evolution, of which we do not possess the design plans. We are supposed to understand it by opening it, and trying to guess what mechanisms are at play.

What is interesting is that observing and trying to understand the mechanisms is basically what science is about, not only neuroscience, so there must be something else in this analogy. For example, we would not describe the goal of astronomy as reverse engineering the planets. What is implied in the phrase is the notion that there is a plan, and that this plan is meant to achieve a function. It is a reference to the teleonomic nature of life in general, and of the nervous system in particular: the brain is not just a soup of neurons, these neurons coordinate their action so as to achieve some function (to survive, to reproduce, etc).

So the analogy is meaningful from this point of view, but as any analogy it has its limits. Is there no difference between a living being and an engineered artifact? This question points at what is life, which is a very broad question, but here I will just focus on two differences that I think are relevant for the present matter.

There is one very important specificity that was well explained by the philosopher Humberto Maturana (“The Organization of the living”, 1974). Engineered things have a structure that is designed so as to fulfill some function, that is, they are made of specific components that have to be arranged in a specific way, according to a plan. So all you need to understand is the structure, and its relation with the function. But as Maturana pointed out, living things have a structure (the body, the wiring of neurons, etc) but they also have an organization that produces that structure. The organization is a set of processes that produce the structure, which is itself responsible for the organization. But what defines the living being is its organization, not its structure, which can change. In the case of the nervous system, the wiring between neurons changes dramatically in the course of life, or even in the course of one hour, and the living being remains the same. The function of the organization is to maintain the conditions for its existence, and since it exists in a body interacting with an external environment, it is in fact necessary that the structure changes so as to maintain the organization. This is what is usually termed “plasticity” or “learning”. Therefore living things are defined by their organization, while engineered things are defined by their structure.

This is one aspect in which the engineering analogy is weak, because it misses this important distinction. Another one is that an engineered thing is made by an engineer, that is, by someone external to the object. Therefore the function is defined with respect to an external point of view. The plan would typically include elements that are defined in terms of physics, concepts that can only be grasped and measured by some external observer with appropriate tools. But a living organism only has its own senses and ways of interacting with the environment to make sense of the world. This is true of the nervous system as a whole, but also of individual cells: a cell has ways of interacting with other cells and possibly with the outside world, but it does not have a global picture of the organism. For example, an engineer plan would specify where each component should go, e.g. with Euclidian coordinates. But this is not how development can work in a living thing. Instead, the plan should come in the form of mechanisms that specify not “where” a thing is, but rather “how to get there”, or perhaps even when a component should transform into a new component – specific ways of interacting that end up in the desired result.

Therefore the nature of the “plan” is really quite different from the plan of an engineer. To make my point, I will draw an analogy with philosophy of knowledge. A plan is a form of knowledge, or at least it includes some knowledge. For example, if the plan includes the statement “part A should be placed at such coordinates”, then there is an implicit knowledge on part of the organism that executes the plan about Euclidian geometry. For an engineer, knowledge comes from physics, and is based on the use of specific tools to measure things in the world. But for a cell, knowledge about the world comes just from the interaction with the world: different ways to sense it (e.g. incoming spikes for a neuron), different ways to act on it (e.g. producing a spike, releasing some molecules in the extracellular medium). A plan can be specified in terms of physics if it is to be executed by an engineer, but it cannot be specified in these terms if it is to be executed by a cell: instead, it would be specified in terms of mechanisms that make sense given the ways the cell can interact with the world. Implicit knowledge about the world that is included in an engineer plan is what I could call “metaphysical knowledge”, in relationship with the corresponding notion in philosophy of science.

Science is made of universal statements, such as the law of gravitation. But not all statements are scientific, for example “there is a God”. In philosophy of science, Karl Popper proposed that a scientific statement is one that can potentially be falsified by an observation, whereas a metaphysical statement is a statement that cannot be falsified. For example, the statement “all penguins are black” is scientific, because I could imagine that one day I see a white penguin. On the other hand, the statement “there is a God” is metaphysical, because there is no way I can check. Closer to the matter of this text, the statement “the world is actually five-dimensional but we live in a three-dimensional subspace” is also metaphysical because independently of whether it is true or not, we have no way to confirm it or to falsify it given the way we interact with the world.

So what I call “metaphysical knowledge” in an engineer plan is knowledge that cannot be corroborated or falsified by the organism that executes the plan, given its senses and possibilities for action. For example, consider the following statement: neurons in the lateral geniculate nucleus project to the occipital region of the brain. This includes metaphysical knowledge about where that region is, which is specified from the point of view of an external observer. This cannot be a biological plan. Instead, a biological plan would rather have to specify what kind of interaction a growing axon should have with its environment in order to end up in the desired region.

In summary, although the phrase “reverse engineering” acknowledges the fact that, contrary to physical things of nature such as planets, living things have a function, it misses several important specificities of life. One is that living things are defined by their organization, rather than by the changing structure that the organization produces, while engineered things are defined by their structure. Another one is that the “plan”, which defines that organization, is of a very different nature than the plan made by and for an engineer, because in the latter case the function and the design are conceived from an external point of view, which generally includes “metaphysical knowledge”, i.e., knowledge that cannot be grasped from the perspective of the organism.

Rate vs. timing (XIX) Spike timing precision and sparse coding

Spike-based theories are sometimes discarded on the basis that spike timing is not reproducible in vivo, in response to the same stimulus. I already argued that, in addition to the fact that this is a controversial statement (because for example this could be due to a lack of control of independent variables such as attentional state), this is not a case for rate-based theories but for stochastic theories.

But I think it also reveals a misunderstanding of the nature of spike-based theories, because in fact, even deterministic spike-based theories may predict irreproducible spike timing. Underlying the argument of noise is the assumption that spikes are produced by applying some operation on the stimulus and then producing the spikes (with some decision threshold). If the timing of these spikes is not reproducible between trials, so the argument goes, then there must be noise inserted at some point in the operation. However, spike-based theories, at least some of them, do not fit this picture. Rather, the hypothesis is that spikes produced by different neurons are coordinated so as to produce some function. But then there is no reason why spikes need to be produced at the same time by the same neurons in all trials in order to produce the same global result. What matters is that spikes are precisely coordinated, which means that the firing of one neuron depends on the previous firing of other neurons. So if one neuron misses a spike, for example, then it will affect the firing of other neurons, precisely so as to make the computation more reliable. In other words, it is implied by the hypothesis of precise spike-based coordination that the firing of a spike by a single neuron should impact the firing of all other neurons, which makes individual firing non-reproducible.

The theory of sparse coding is line with this idea. In this theory, it is postulated that the stimulus can be reconstructed from the firing of neurons. That is, each spike contributes a “kernel” to the reconstruction, at the time of the spike, and all such contributions are added together so that the reconstruction is as close as possible to the original stimulus. Note how this principle is in some way the converse of the previously described principle: the spikes are not described as the result of a function applied to the stimulus, but rather the stimulus is described as a function of the spikes. So spike encoding is defined as an inverse problem. This theory has been rather successful in explaining receptive fields in the visual (Olshausen) and auditory (Lewicki) systems. It is also meant to make sense from the point of view of minimizing energy consumption, as it minimizes the number of spikes required to encode the stimulus with a given precision. There are two interesting points here, regarding our present discussion. First, it appears that spikes are coordinated in the way I just described above: if one spike is missed, then the other spikes should be produced so as to compensate for this loss, which means there is a precise spike-based coordination between neurons. Second, the pattern of spikes is seen as a solution to an inverse problem. This implies that if the problem is degenerate, then there are several solutions that equally good in terms of reconstruction error. Imagine for example that two neurons contribute exactly the same kernel to the reconstruction – which is not useless if one considers the fact that firing rate is limited by the refractory period. Then on one given trial, either of these two neurons may spike. From the observer point of view, this represents a lack of reproducibility. However, this lack of reproducibility is precisely due to the fact that there is a precise spike-based coordination between neurons: to minimize the reconstruction error, just one of the two neurons should be active, and the timing should be precise too.

Sparse coding with spikes also implies that reproducibility should depend on the stimulus. That is, a stimulus that is highly redundant such as a sinusoidal grating makes a degenerate inverse problem, leading to lack of reproducibility of spikes, precisely because of the coordination between spikes; a stimulus that is highly informative such a movie of a natural scene should lead to higher reproducibility of spikes. Therefore, in the sparse coding framework, the spike-based coordination hypothesis predicts, contrary to rate-based theories, that spike time reproducibility should depend on the information content of the stimulus – in the sense that a more predictable stimulus leads more irreproducible spiking. But even when spiking is not reproducible, it is still precise.

Rate vs. timing (XVIII) Spiking as analog-digital conversion: the evolutionary argument

Following on the previous post, with the analog-digital analogy often comes the idea that the relation between rates and spikes is that of an analog-digital conversion. Or spikes are seen as an analog-digital conversion from the membrane potential. I believe this comes from the evolutionary argument that it seems that spikes appeared for fast propagation of information on long distances, and not because there is anything special about them in terms of computation. It is quite possible that this was indeed the evolutionary constraint that led to the appearance of action potentials (although this is pure speculation), but even if this is true, the reasoning is wrong: for example, the ability of humans to make tools might have developed because they stood up. Yet standing up does not explain tool-making at all. So standing up allows new possibilities, but these possibilities follow a distinct logic. Spikes might have appeared primarily to transmit information at long distances, but once they are there, they have properties that are used, possibly for other purposes, in new ways. In addition, that they appeared to transmit information and the information was analog does not mean information is now used in the same way. Consider: to transmit information over long distances, one uses Morse code on the telegraph. Do you speak to the telegraph? No, you change the code and use a discrete code that has little connection with the actual sound wave. Finally, even if all this makes sense, it still is not an argument in favor rate-based theories, because rate is an abstract quantity that is derived from spikes. So if we wanted to make the case that spikes are only there to carry a truly analog value, the membrane potential, then it would lead us to discard spikes as a relevant descriptive quantity, and a fortiori to discard rates as well. From a purely informational viewpoint (in the sense of Shannon), spikes produced by a neuron carry less information than its membrane potential, but rate carries even less information, since it is abstracted from spikes.