What is computational neuroscience? (XVIII) Representational approaches in computational neuroscience

Computational neuroscience is the science of how the brain “computes”: how it recognizes faces or identifies words in speech. In computational neuroscience, standard approaches to perception are representational: they describe how neural networks represent in their firing some aspect of the external world. This means that a particular pattern of activity is associated to a particular face. But who makes this association? In the representational approach, it is the external observer. The approach only describes a mapping between patterns of pixels (say) and patterns of neural activity. The key step, of relating the pattern of neural activity to a particular face (which is in the world, not in the brain), is done by the external observer. How then is this about perception?

This is an intrinsic weakness of the concept of a “representation”: a representation is something (a painting, etc) that has a meaning for some observer, it is not about how this meaning is formed. Ultimately, it does not say much about perception, because it simply replaces the problem of how patterns of photoreceptor activity lead to perception by the problem of how patterns of neural activity lead to perception.

A simple example is the neural representation of auditory space. There are neurons in the auditory brainstem whose firing is sensitive to the direction of a sound source. One theory proposes that the sound's direction is signaled by the identity of the most active neuron (the one that is “tuned” to that direction). Another one proposes that it is the total firing rate of the population, which covaries with direction, that indicates sound direction. Some other theory considers that sound direction is computed as a “population vector”: each neuron codes for direction, and is associated a vector oriented in that direction, with a magnitude equal to its firing rate; the population vector is sum of all vectors.

Implicit in these representational theories is the idea that some other part of the brain “decodes” the neural representation into sound's direction, which ultimately leads to perception and behavior. However, this part is left unspecified in the model: neural models stop at the representational level, and the decoding is done by the external observer (using some formula). But the postulate of a subsequent neural decoder is problematic. Let us assume there is one. It takes the “neural representation” and transforms it into the target quantity, which is sound direction. But the output of a neuron is not a direction, it is a firing pattern or rate that can perhaps be interpreted as a direction. So how is sound direction represented in the output of the neural decoder? It appears that the decoder faces the same conceptual problem, which is that the relationship between output neural activity and the actual quantity in the world (sound direction) has to be interpreted by the external observer. In other words, the output is still a representation. The representational approach leads to an infinite regress.

Since neurons are in the brain and things (sound sources) are in the world, the only way to avoid an external “decoding” stage that relates the two is to include both the world and the brain in the perceptual model. In the example above, this means that, to understand how neurons estimate the direction of a sound source, one would not look for the “neural representation” of sound sources but for neural mechanisms that, embedded in an environment, lead to some appropriate orienting behavior. In other words, neural models of perception are not complete without an interaction with the world (i.e., without action). In this new framework, “neural representations” become a minor issue, one for the external observer looking at neurons.

Neural coding and the invariance problem

In sensory systems, one of the hardest computational problems is the “invariance problem”: the same perceptual category can be associated with a large diversity of sensory signals. A classical example is the problem of a recognizing a face: the same face can appear with different orientations relative to the observer, and under different lighting conditions, and it is a challenge to design a recognition system that is invariant to these sources of variation.

In computational neuroscience, the problem is usually framed within the paradigm of statistical learning theory as follows. Perceptual categories belong to some set Y (the set of faces). Sensory signals belong to some high-dimensional sensory space X (e.g. pixels). Each particular category (a particular face) corresponds to a specific set of signals in X (different views of the face) or to a distribution on X. The goal is to find the correct mapping from X to Y from particular labeled examples (a particular view x of a face, the name y corresponding to that face). This is also the view that underlies the “neural coding” paradigm, where there is a communication channel between Y and X, and X contains “information” about Y.

Framed in this way, this is a really difficult problem in general, and it requires many examples to form categories. However, there is a different way of approaching the problem, which follows from the concept of “invariant structure” developed by James Gibson. It starts with the observation that a sensory system does not receive a static input (an image) but rather a sensory flow. This is obvious in hearing (sounds are carried by acoustic waves, which vary in time), but it is also true of vision: the eyes are constantly moving even when fixating an object (e.g. high-frequency tremors). A perceptual system is looking for things that do not vary within this sensory flow, the “invariant structure”, because this is what defines the essence of the world.

I will develop the example of sound localization. When a source produces a sound, there are time-varying acoustical waves propagating in the air, and possibly reaching the ears of a listener. The input to the auditory system is two time-varying signals. Through the sensory flow, the identity and spatial location of the source are unchanged. Therefore, any piece of information about these two things must be found in properties of the auditory signals that are invariant through the sensory flow. For example, if we neglect sound diffraction, the fact that one signal is a delayed copy of the other, with a particular delay, is true as long as the sound exists. An invariant property of the acoustic signals is not necessary about the location of the sound source. It could be about the identity of the source for the example (the speaker). However, if that property is no longer invariant when movements are produced by the organism, then that property cannot be an intrinsic property of the source, but rather something about the relative location of the sound source.

In this framework, the computational problem of sound localization is in two stages: 1) for any single example, pick-up an acoustical invariant that is affected by head movements, 2) associate these acoustical invariants with sound location (either externally labeled, or defined with head movements). The second stage is essentially the computational problem defined in the neural coding/statistical learning framework. But the first stage is entirely different. It is about finding an invariant property within a single example, and this only makes sense if there is a sensory flow, i.e., if time is involved within a single example and not just across examples.

There is a great benefit in this approach, which is to solve part of the invariance problem from the beginning, before any category is assigned to an example. For example, a property about the binaural structure produced by a broadband sound source at a given position will also be true for another sound source at the same position. In this case, the invariance problem has disappeared entirely.

Within this new paradigm, the learning problem is now: given a set X of time-varying sensory signals produced by sound sources, how to find a mapping from X to some other space Y such that the images of sensory signals through this mapping do not vary over time, but vary across sources? Phrased in this way, this is essentially the goal of slow feature analysis. However the slow feature algorithm is a machine learning technique, whose biological instantiation is not straightforward.

There have been similar ideas in the field. In a highly cited paper, Peter Földiak proposed a very simple unsupervised Hebbian rule based on related considerations (Földiak, Neural Comp 1991). The study focused on the development of complex cells in the visual system, which respond to edges independently of their location. The complex cell combines inputs from simple cells, which respond to specific edges, and the neuron must learn the right combination. The invariance is learned by presenting moving edges, that is, it is looked for within the sensory flow and not across independent examples. The rule is very simple: it is a Hebbian rule in a rate-based model, where the instantaneous postsynaptic activity is replaced by a moving average. The idea is simply that, if the output must be temporally stable, then the presynaptic activity should be paired with the output at any time. Another paper by Schraudolph and Sejnowski (NIPS 1992) is actually about finding the “invariant structure” (with no mention of Gibson) using an anti-Hebbian rule, but this means that neurons signal the invariant structure by not firing, which is not what neurons in the MSO seems to be doing (although perhaps the idea might be applicable to the LSO).

There is a more recent paper in which slow feature analysis is formally related to Hebbian rules and to STDP (Sprekeler et al., PLoS CB 2007). Essentially, the argument is that minimizing the temporal variation of the output is equivalent to maximizing the variance of the low-pass filtered output. In other words, they provide a link between slow feature analysis and Földiak’s simple algorithm. There are also constraints, in particular the synaptic weights must be normalized. Intuitively this is obvious: to aim for a slowly varying input is the same thing as to aim for increasing the low frequency power of the signal. The angle in the paper is rather on rate models but it gives a simple rationale for designing learning rules that promote slowness. In fact, it appears that the selection of slow features follows from the combination of three homeostatic principles: maintaining a target mean potential and a target variance, and minimizing the temporal variation of the potential (through maximizing the variance of the low-pass filtered signal). The potential may be replaced by the calcium trace of spike trains, for example.

It is relatively straightforward to see how this might be applied for learning to decode the activity of ITD-sensitive neurons in the MSO into the location of the sound source. For example, a target neuron combines inputs from the MSO into the membrane potential, and the slowness principle is applied to either the membrane potential or the output spike train. As a result, we expect the membrane potential and the firing rate of this neuron to depend only on sound location. These neurons could be in the inferior colliculus, for example.

But can this principle be also applied to the MSO? In fact, the output of a single neuron in the MSO does not depend only on sound location, even for those neurons with a frequency-dependent best delay. Their output also depends on sound frequency, for example. But is it possible that their output is as slow as possible, given the constraints? It might be so, but another possibility is that only some property of the entire population is slow, and not the activity of individual neurons. For example, in the Jeffress model, only the identity of the maximally active neuron is invariant. But then we face a difficult question: what learning criterion should be applied at the level of an individual neuron so that there is a slow combination of the activities of all neurons?

I can imagine two principles. One is a backpropagation principle: the value of the criterion in the target neurons, i.e., slowness, is backpropagated to the MSO neurons and acts as a reward. The second is that the slowness criterion is applied at the cellular level not to the output of the cell, but to a signal representing a combined activity of neurons in the MSO, for example the activity of neighboring neurons.

What does it mean to represent a relation?

In this blog, I have argued many times that if there are neural representations, these must be about relations. For example, a relation between two sensory signals, or about a potential action and the effect on the sensory signals. But what does it mean exactly that something (say neural activity) “represents” a relation? It turns out that the answer is not so obvious.

The classical way to understand it is to consider that a representation is something (an event, a number of spikes, etc) that stands for the thing being represented. That is, there is a mapping between the thing being represented and the thing that represents it. For example, in the Jeffress model of sound localization, the identity of the most active binaural neuron stands for the location of the sound, or in terms of relation, for the fact that the right acoustical signal is a delayed copy of the left acoustical signal, with a specific delay. The difficulty here is that a representation always involves three elements: 1) the thing to be represented, 2) the thing that represents it, 3) the mapping between the first two things. But in the classical representational view, we are left with only the second element. In what sense does the firing of a binaural neuron tells us that there is such a specific relation between the monaural signals? Well it doesn’t, unless we already know in advance that this is what the firing of that neuron stands for. But from observing the firing of the binaural neurons, there is no way we can ever know that: we just see neurons lighting up sometimes.

There are different ways to address this issue. The simplest one is simply to say: it doesn’t matter. The activity of the binaural neurons represents a relationship between the monaural neurons, at least for us external observers, but the organism doesn’t care: what matters is that their activity can be related to the location of the sound source, defined for example as the movement of the eyes that put the sound source in the fovea. In operational terms, the organism must be able to take an action conditionally to the validity of a given relation, but what this relation exactly is in terms of the acoustical signals doesn’t matter.

An important remark is in order here. There is a difference between representing a relation and representing a quantity (or vector), even in this simple notion of representation. A relation is a statement that may be true or not. This is different from a quantity resulting from an operation. For example, one may always calculate the peak lag in the cross-correlation function between two acoustical signals, and call this “the ITD” (interaural time difference). But such a number is obtained whether there is a source, several sources or no source at all. Thus, this is not the same as relations of the form: the right signal equals the left signal delayed by 500 µs. Therefore, we are not speaking of a mapping between acoustical signals and action, which would be unconditional, but of actions conditional to a relation in the acoustical signals.

Now there is another way to understand the phrase “representing a relation”, which is in a predictive way: if there is a relation between A and B, then representing the relation means that from A and the relation, it is possible to predict B. For example: saying that the right signal is a delayed copy of the left signal, with delay 500 µs, means that if I know that the relation is true and I have the left signal, then I can predict the right signal. In the Jeffress model, or in fact in any model that represents the relation in the previous sense, it is possible to infer the right signal from the left signal and the representation, but only if the meaning of that representation is known, i.e., if it is known that a given neuron firing stands for “B comes 500 µs after A”. This is an important distinction with the previous notion of representation, where the meaning of the relation in terms of acoustics was irrelevant.

We now have a substantial problem: where does the meaning of the representation come from? The firing of binaural neurons in itself does not tell us anything about how to reconstruct signals. To see the problem more clearly, imagine that the binaural neurons develop by selecting axons from both sides. In the end there is a set of binaural neurons whose firing stands for binaural relations with different ITDs. But by just looking at the activity of the binaural neurons after development, or at the activity of both the binaural neurons and the left monaural acoustical signal, it is impossible to know what the ITD is, or what the right acoustical signal is at any time. To be able to do this, one actually needs to have learned the meaning of the representation carried by the binaural neurons, and this learning seems to require both monaural inputs.

It now seems that this second notion of representation is not very useful, since in any case it requires all terms of the relation. This brings us to the notion that to represent a relation, and not just a specific instantiation of it (i.e., these particular signals have such property), it must be represented in a sense that may apply to any instantiation. For example, if I know that a source at a given location, I can imagine for any left signal what should be the right signal. Or, given a signal on the left, I can imagine what should be the right signal if the source were at a given location.

I’m ending this post with probably more confusion than when I started. This is partly intended. I want to stress here that once we start thinking of perceptual representations in terms of relations, then classical notions of neural representations quickly seem problematic or at least insufficient.

Information about what?

In a previous post, I pointed out that the word “information” is almost always used in neuroscience in the sense of information theory, and this is a very restricted notion of information that leads to dualism in many situations. There is another way to look at this issue, which is to ask the question: information about what?

In discussions of neural information or “codes”, there are always three elements involved: 1) what carries the information, e.g. neural electrical activity, 2) what the information is about (e.g. the orientation of a bar), 3) the correspondence between the first two elements. If dualism is rejected, then all the information an organism ever gets from the world must come from its own senses (and the effect of actions on them). Therefore, if one speaks of information for the organism, as opposed to information for an external observer, then the key point to consider is that the information should be about something intrinsic to the organism. For example, it should not be about an abstract parameter of an experimental protocol.

So what kind of information are we left with? For example, there may be information in one sensory signal about another sensory signal, in the sense that one can be (partially) predicted from the other. Or there can be information in a sensory signal about the future signal. This is equivalent to saying that the signals follow some law, a theme developed for example by Gibson (invariant structure) and O’Regan (sensorimotor contingency).

One might think that this conception of information would imply that we can’t know much about the world. But this is not true at all, because there is knowledge about the world coming from the interaction of the organism with the world. Consider space for example. A century ago, Poincaré noted that space, with its topology and structure, can be entirely defined by the effect of our own movements on our senses. To simplify, assume that the head and eyes are fixed in our body and we can move only by translations, although with possibly complex movements. We can go from point A to point B through some movements. Points A and B differ by the visual inputs. Movements act on the set of points (=visual inputs) as a group action that has the structure of a two-dimensional Euclidian space (for example, for each movement there is an opposite movement that takes you back to the previous point, combinations of movements are commutative, etc). This defines space as a two-dimensional affine space. In fact, Poincaré (and also Einstein) went further and noted that space as we know it is necessarily defined with respect to an observer. For example, we can define absolute Cartesian coordinates for space, but the reference point is arbitrary, only relative statements are actually meaningful.

In summary, it is not so much that the concept of information or code is completely irrelevant in itself. The issues arise when one speaks of codes about something external to the organism. In the end, this is nothing else than a modern version of dualism (as Dennett pointed out with his “Cartesian theater”). Rejecting dualism implies that any information relevant to an organism must be about something that the organism can do or observe, not about what an external observer can define.

Complex representations and representations of complex things

In a previous post, I noted that the concept of neural assembly is limited by the fact that it does not represent relations. But this means that it is not possible to represent in this way a complex thing such as a car or a face. This might seem odd since many authors claim that there are neurons or groups of neurons that code for faces (in IT). I believe there might be again some confusion between representation and information in Shannon’s sense. What is meant when it is stated that an assembly of neurons codes for a face is that its activity stands for the presence of a face in the visual field. So in this sense the complex thing, the face, is represented, but the representation itself is not complex. With such a concept of representation, complex things can only be represented by removing all complexity.

This is related to the problem of invariant representations. How is it that we can recognize a face under different viewpoints, lightning conditions, possibly changes in hair style and facial expressions? One answer is that there must be a representation that is invariant, i.e., a neural assembly that codes for the concept “Paul’s face” independently of the specific way it can appear. However, this is an incomplete answer, for when I see Paul’s face, I can recognize that it’s Paul, but I can also see that he smiles, that I’m looking at him from the side, that he tainted his hair in black. It’s not that by some process I have managed to remove all details that are not constituent of the identity of Paul’s face, but rather I am seeing everything that makes Paul’s face, both in the way it usually appears and in the specific way it appears this time. So the fact that we can recognize a complex thing in an invariant way does not mean that the complexity itself is discarded. In reality we can still register this complexity, and our mental representation of a complex thing is indeed complex. As I argued before, the concept of neural assembly is too crude to capture such complexity.

The concept of invariance is even more interesting when applied to categories of objects, for example a chair. In contrast with Paul’s face, different chairs are not just different viewpoints on the same physical object, they really are different physical objects. They can have different colors, widely different shapes and materials. They usually have four legs, but surely we would recognize a three-legged chair as such. What really makes a chair is that one can sit on it, have her back in contact with it. This is related to Gibson’s concept of “affordances”. Gibson argued that we perceive the affordances of things, i.e., the possibilities of interaction with things.

So now I could imagine that there is an assembly of neurons that codes for the category “chair”. This is fine, but this is only something that stands for the category, it does not describe what this category is. It is not the representation of an affordance. Representing it would involve representing the potential action that one could make with that object. I do not know what kind of neural representation would be adequate, but it would certainly be more complex (i.e., structured) than a neural assembly.

On the notion of information in neuroscience

In a previous post, I criticized the notion of “neural code”. One of my main points was that information can only make sense in conjunction with a particular observer. I am certainly not the first one to make this remark: for example, it is presented in a highly cited review by deCharms and Zador (2000). More recently Buzsaki defended this point of view in a review (Neuron 2010), and from the notes in the supplemental material, it appears that he is clearly more philosophically lucid than the average neuroscientist on these issues (check the first note). I want to come back on this issue in more detail.

When one speaks of information or code in neuroscience, it is generally meant in the sense of Shannon. This is a very specific notion of information coming from communication theory. There is an emitter who wants to transmit some message to a receiver. The message is transmitted in an altered form called “code”, for example Morse code, which contains “information” insofar as it can be “decoded” by the observer into the original message. The metaphor is generally carried to neuroscience in the following form: there are things in the external world that are described in some way by the experimenter, for example bars with a variable orientation, and the activity of the nervous system is seen as a “code” for this description. It may carry “information” about the orientation of the bar insofar one can reconstruct the orientation from the neural activity.

It is important to realize how limited this metaphor is, and indeed that it is a metaphor. In a communication channel, the two ends agree upon a code, for example on the correspondence between letters and Morse code. For the receiving end, the fact that the message is information in the common sense of the word relies on two things: 1) that the correspondence is known, 2) that the initial message itself makes sense for the receiver. For example, imagine a few centuries ago, someone is given a papyrus with ancient Egyptian hieroglyphs. Probably it will represent very little information for that person because she has no way to make sense of it. The papyrus becomes informative with the Rosetta stone, where the same text is written in ancient Egyptian and in ancient Greek, so that the papyrus can be translated to ancient Greek. But of course this becomes information only if ancient Greek makes sense for the person that reads it!

So the metaphor of a “neural code”, understood in Shannon’s sense, is problematic in two ways: 1) the experimenter and the nervous system obviously do not agree upon a code, and 2) how the original “message” makes sense for the nervous system is left entirely unspecified. I will give another example to make it clearer. Imagine you have a vintage thermometer (non-digital), but that thermometer does not have any graduation. You could replace the thermometer by the activity of a temperature-sensitive neuron. From the point of view of information theory, there is just as much information about temperature in the liquid level than if temperature were given as a number of Celsius degrees. But clearly for an observer, there is very little information because one does not know the relationship between the level of the liquid and the physical temperature, so it is essentially useless. Perhaps one could say that the level says something relative about temperature, that is, whether a temperature is hotter than another one. But even this is not true, because it relies on the prior knowledge that the level of the liquid increases when the temperature increases, a physical law that is not obvious at all. So to make sense of the liquid level, one would actually rely on association with other sources of information that are not given by the thermometer, e.g. that for some level one feels cold and that for another level one feels hot. But now this means that the information in the liquid level is actually limited (and in fact defined) not by the “communication channel” (how accurate the thermometer is) but by the external source of knowledge that provides meaning to the liquid level. This limitation comes from the fact that at no moment in time is the true temperature in Kelvin given as an objective truth to the observer. The only way it gets information is through its own sensors. This is why Shannon’s information is highly misleading as a metaphor for information in biological systems: there can be no agreed code between the environment and the organism. The organism has to learn ancient Egyptian just with hieroglyphs.

To finish with this example, imagine now that the thermometer is graduated, so you can read the temperature. Wouldn’t this provide the objective information that was previously missing? As a matter of fact, not really. For example, as a European, if I am given the temperature in Fahrenheit degrees, I have no idea whether it is hot or cold. So the situation is not different for me than previously. Of course if I am also given the correspondence between Fahrenheit and Celsius, then it will start making sense for me. But how can Celsius degrees make sense for me in the first place? Again these are just numbers with arbitrary units. Celsius degrees make sense because they can be related to physical processes linked with temperature: water freezes at 0° and boils at 100°. Presumably, the same thing applies to our perception of temperature: the body senses a change in firing rate of some temperature-sensitive neuron, and this becomes information about temperature because it can be associated with a number of biophysical processes linked with temperature, say sweating, and all these effects can be noticed. In fact, what this example shows is that the activity of the temperature-sensitive neuron does not provide information about physical temperature (number of Kelvin degrees), but rather about the occurrence of various other events that can be captured with other sensors. This set of relationships between events is, in a way, the definition of temperature for the organism, rather than some number in arbitrary units.

Let us summarize. In Shannon’s information theory, it is implicitly assumed that there are two ends in a communication channel, and that 1) both ends agree upon a code, i.e., a correspondence between descriptive elements of information on both ends, and that 2) the initial message on the emitter end makes sense for the observer at the other end. None of these two assumptions apply to a biological organism because there is only one end. All the information that it can ever get about the world comes from that end, and so in this context Shannon’s information only makes sense for an external observer who can see both ends. A typical error coming from the failure to realize this fact is to highly overestimate the information in neural activity about some experimental quantity. I discussed this specific point in detail in a recent paper. The overestimation comes simply from the fact that detailed knowledge about the experiment is implicitly assumed on behalf of the nervous system.

Followed to its logical conclusions, the information-processing line of reasoning leads to what Daniel Dennett called the “Cartesian theater”. If neural activity gives information about the world in Shannon’s sense, then this means that at some final point this neural activity has to be analyzed and related to the external world. Indeed if this does not happen, then we cannot be speaking about Shannon information, for there is no link with the initial message. So this means that there is some critical stage at which neural activity is interpreted in objective terms. As Dennett noted, this is conceptually not very far from the dualism of Descartes, who thought that there is a non-material mind that reads the activity of the nerves and interprets it in terms of the outside physical world. The “Cartesian theater” is the brain seen as a screen where the world is projected, that a homunculus (the mind) watches.

Most neuroscientists reject dualism, but if one is to reject dualism, then there must be no final stage at which the observer end of the communication channel (the senses) is put in relationship with the emitter end (the world). All information about the world must come from the senses, and the senses alone. Therefore, this “information” cannot be meant in Shannon’s sense.

This, I believe, is essentially what James Gibson meant when he criticized the information-processing view of cognition. It is also related to Hubert Dreyfus’s criticism of artificial intelligence. More recently, Kevin O’Regan made similar criticisms. In his most cited paper with Noë (O’Regan and Noë, BBS 2001), there is an illuminating analogy, the “villainous monster”. Imagine you are exploring the sea with an underwater vessel. But a villainous monster mixes all the cables and so all the sensors and actuators are now related to the external world in a new way. How can you know anything about the world? The only way is to analyze the structure of sensor data and their relationships with actions that you can perform. So if one rejects dualism, then this is the kind of information that is available to the nervous system. A salient feature of this notion of information is that, contrary to Shannon’s information, it is defined not as numbers but as relations or statements: if I do action A, then sensory property B happens; if sensory property A happens, then another property B will happen next; if I do action A in sensory context B, then C happens.

 

Philosophy of knowledge

We have concluded that, if dualism is to be rejected, then the right notion of information for a biological organism is in terms of statements. This makes the problem of perception quite similar to that of science. Science is made of universal statements, such as the law of gravitation. But not all statements are scientific, for example “there is a God”. In philosophy of knowledge, Karl Popper proposed that a scientific statement is one that can potentially be falsified by an observation, whereas a metaphysical statement is a statement that cannot be falsified. For example, the statement “all penguins are black” is scientific, because I could imagine that one day I see a white penguin. On the other hand, the statement “there is a God” is metaphysical, because there is no way I can check. Closer to the matter of this text, the statement “the world is actually five-dimensional but we live in a three-dimensional subspace” is also metaphysical because independently of whether it is true or not, we have no way to confirm it or falsify it.

To come back to the matter of this text, I propose to qualify as metaphysical for an organism all knowledge that cannot be falsified, given the senses and possibilities for action. For example, in an experiment, one could relate the firing rate of a neuron with the orientation of a bar presented in front of the eyes. There is information in Shannon’s sense about the orientation in the firing rate. This means that we can “decode” the firing rate into the parameter “orientation”. However this decoding requires metaphysical knowledge because “orientation” is defined externally by the experimenter, it does not come out from the neuron’s activity itself. From the neuron’s point of view, there is no way to falsify the statement “10 Hz means horizontal bar”, because the notion of horizontal (or bar) is either defined in relation to something external to the neuron, or by its activity itself (horizontal is when the activity is 10 Hz) and in this latter case the statement is a tautology.

Therefore it appears that there can be very little information without metaphysical knowledge in the response of a single neuron, or in its input. Note that it is not completely empty, for there could be information about the future state of the neuron in the present state.

 

The structure of information and “neural assemblies”

When information is understood as statements rather than numbers to be decoded, it appears that information to be represented by the brain is much richer than implied by the usual notion inspired by Shannon’s communication theory. In particular, the problem of perception is not just to relate a vector of numbers (e.g. firing rates) to a particular set of parameters representing an object in the world. What is to be perceived is much richer than that. For example, in a visual scene, there could be Paul, a person I know, wearing a new sweater, sitting in a car. What is important here is that a scene is not just a “bag of objects”: objects have relationships with each other, and there are many possible different relationships. For example there is a car and there is Paul, and Paul is in a specific relationship with the car, that of “sitting in it”.

Unfortunately this does not fit well with the concept of “neural assemblies”, which is the mainstream assumption about how things we perceive are represented in the brain. If it is true that any given object is represented by the firing of a given assembly of neurons, then several objects should be represented by the firing of a bigger assembly of neurons, the union of all assemblies, one for each object. Several authors have noted that this may lead to the “superposition catastrophe”, i.e., there may be different sets of objects whose representations are fused into the same big assembly. But let us assume that this problem has somehow been solved and that there is no possible confusion. Still, the representation of a scene can be nothing else than an unstructured “bag of objects”, there are no relationships between objects in the assembly representation. One way to save the assembly concept is to consider that there are combination assemblies, which code for specific combinations of things, perhaps in a particular relationship. But this cannot work if it is the first time I see Paul in that sweater. There is a fundamental problem with the concept of neural assembly, which is that there is representation of relations, only of things to be related. In analogy with language, there is no syntax in the concept of neural assemblies. This is actually the analogy chosen by Buzsaki in his recent Neuron review (2010).

This remark, mostly made in the context of the binding problem, has led authors such as von der Malsburg to postulate that synchrony is used to bind the features of an object, as represented by neural firing. This avoids the superposition catastrophe because at a given time, only one object is represented by neural firing. It also addresses the problem of composition: by defining different timescales for synchrony, one may build representations for objects composed of parts, possibly in a recursive manner. However, the analogy of language shows that this is not going to be enough, because only one type of relation can be represented in this way. But the same analogy also shows that it is conceptually possible to represent structures as complex as linguistic structure by using time, in analogy with the flow of a sentence. Just for the sake of argument, and I do not mean that this is a plausible proposition (although it could be), you could imagine that assemblies can code either things (Paul, a car, a jumper) or relations between things (sitting, wearing), that only one assembly would be active at a time, and that the order of activation indicate which things a relation applies to. Here not only synchrony is important, but also the order of spikes. This idea is quite similar to Buszaki’s “neural syntax” (based on oscillations), but I would like to emphasize a point that I believe has not been noticed: that assemblies must stand not only for things but also for relations between things (note that “things” can also be thought of relations, and in this case we are speaking of relations of different orders).

All this discussion, of course, is only meant to save the concept of neural assembly and perhaps one might simply consider that a completely different concept should be looked for. I do not discard this more radical possibility. However, I note that if it is admitted that neurons interact mostly with spikes, then somehow the spatio-temporal pattern of spikes is the only way that information can be carried. Unless, perhaps, we are completely misled by the notion of “information”.

Rate vs. timing (XIV) The neural "code"

I am making a break before I continue on my review of spike-based theories, and I want to comment on the notion of “neural code”. These are keywords that are used in a large part of the neuroscience literature, and I think they are highly misleading. For example, you could say that neurons in V1 “code for orientation”. But what this statement refers to, in reality, is simply that if we record the response of such a neuron to an oriented bar, then we observe that its firing rate is modulated by the orientation, peaking at an orientation that is then called the “preferred orientation”. First of all, the notion of a “preferred orientation” is just a definition tied to the specific experimental protocol (the same is true for the notion of best delay in sound localization). Empirically speaking, it is an empty statement. In particular, by itself it does not mean that the cell actually “prefers” some orientations to others in any way, because a preferred orientation can be defined in any case and could be different for different protocols – it is just the stimulus parameter giving maximum response. So the only empirical statement associated to the claim “the neuron codes for orientation” is in fact: the neuron’s firing rate varies with orientation. Therefore, using the word “codes” is just a more appealing way of saying “varies”, but the empirical content is actually no more than “varies”.

In what sense then can we say that the neuron “codes” for orientation? Coding means presenting some information in a way that can be decoded. That is, the neuron codes for an orientation with its firing rate in the sense that from its firing rate it is possible to infer the orientation. Here we get to the first big problem with the notion of a “neural code”. If the firing rate varies with orientation and one knows exactly how (quantitatively), then of course it is possible to infer some information about orientation from the firing rate. The way you would decode a particular firing rate into an estimated orientation is by looking at the tuning curve, obtained by the experimental protocol, and look for the orientation that gives the best matching firing rate. But this means that the decoding process, and therefore the code, is meant from the experimenter’s point of view, not from the organism’s point of view. The organism does not know the experimental protocol, so it cannot make the required inference. If all the organism can use to decode the orientation is a number of spikes, then clearly this task is nearly impossible, because without additional knowledge, a tremendous number of stimuli could produce that same number of spikes (e.g. by varying contrast or simply presenting something else than a bar). Thus the first point is that the notion of a code is experimenter-centric, so talking about a “neural code” in this sense is highly misleading, as the reader of this code is not neurons but the experimenter.

So the first point is that, if the notion of a neural code is to make any sense at all, it should be refined so as to remove any reference to the experimental protocol. One clear implication is that the idea that a single neuron can code for anything is highly questionable: is it possible to infer anything meaningful about the world from a single number (spike count), and no a priori knowledge? Perhaps the joint activity of a set of neurons may make more sense. This reduces the interest of “tuning curves” in terms of coding – it may still be informative about what neurons “care about”, but not about how they represent information, if there is such a thing. Secondly, removing any reference to the experimental protocol means that one can speak of a neural code for orientation only if it does not depend on other aspects, e.g. contrast. Indeed if the responses were sensitive to orientation but also to everything else in the stimulus, how could one claim that the neuron codes for orientation? Finally, thinking of a code with a neural observer in mind means that, perhaps, not all codes make sense. Indeed, is the function of V1 to “represent” the maximum amount of visual information? This view, and the search for “optimal codes” in general, seems very odd from the organism’s point of view: why devote so many neurons and so much energy to represent exactly the same amount of information that is already present in the retina? If a representation has any use, then this representation must be different in nature from the original presentation, and not just in content. So the point is not about how much information there is, but in what form it is represented. This means that codes cannot be envisaged independently of a potential decoder, i.e., a specific way in which neurons use the information.

I now come to a deeper criticism of the notion of neural code. I started by showing that the notion is often meant in the sense of a code for the observer, not for the brain. But let us say that we have fixed that error and we are now looking for neural codes, with a neural-centric view rather than an experimenter-centric view. But still, the methodology is: looking at neural responses (rates or spike timings) and trying to find how much information there is and under what form. Clearly then, the notion that neurons code for things is not an empirical finding: it is an underlying assumption of the methodology. It starts, not ends, by assuming that neurons fire so that the rest of the brain can observe it and take the information it sees in this firing. It is postulated, not observed, that what neurons do is produce some form of representation for the rest of the brain to see. This appears to be very centered on the way we, external observers, acquire knowledge about the brain, and it has a strong flavor of the homunculus fallacy.

I suggest we consider another perspective on what neurons do. Neurons are cells that continuously change in many aspects, molecular and electrical. Even though we may want to describe some properties of their responses, spikes are transient signals, there is nothing persistent in them in the same way as a painting. So neurons do not represent the world in the same way as a painter would represent the world. Second, spikes are not things that a neuron leaves there for observers to see, like the pigments on a painting. On the contrary, a neuron produces a spike and actively sends it to target neurons, where changes will occur because of this spike. This is much more like an action than like a representation. Thus it is wrong to say that the postsynaptic neuron “observes” the activity of presynaptic neurons. Rather, it is influenced by it. So neural activity is not a representation, it is rather an action.

To briefly summarize this post: neurons do not code. This is a view that can only be adopted by an external observer, but it is not very meaningful to describe what neurons do. Perhaps it is more relevant to say that neurons compute. But the best description, probably, is to say that neurons act on other neurons by means of their electrical activity. To connect with the general theme of this series, these observations emphasize the fact that the basis of neural computation is truly spikes, and that rates are an external observer-centric description of neural activity.

"The brain uses all available information"

In discussions of « neural coding » issues, I have often heard the idea that “the brain uses all available information”. This idea generally pops up in response to the observation that neural responses are complex and vary with stimuli in ways that are difficult to comprehend. In this variability there is information about stimuli, and as complex as the mapping from stimuli to neural responses may be, the brain might well be able to invert this mapping. I sympathize with the notion that neural heterogeneity is information rather than noise, but I believe that, phrased in this way, this idea reveals two important misconceptions.

First of all, there is often a confusion between sensitivity (responses vary along several stimulus dimensions) and information (you can recover these dimensions from the responses). I made this point in a specific paper two years ago (pdf). Neural responses are observed for a specific experimental protocol, which is always constrained in a limited set of stimuli. One can often recover stimulus dimensions from the responses within this set, but it is a mistake to conclude that the brain can do it, because this inverse mapping depends on the particular experimental set of stimuli. In other words, the mapping is in fact from the observed neural responses and the knowledge of the experimental protocol to the stimulus. The brain does not have access to such an external knowledge. Therefore, information is always highly overestimated in this type of analysis. This is in fact a classical problem in machine learning, related to the issues of training vs. test error, generalization and overfitting. The key concept is robustness: the hypothesized inverse mapping should be robust to large changes in the set of stimuli.

The second misconception is more philosophical, and has to do with the general investigation of “neural codes”. What is a code? It is a way of representing information. But sensory information is already present at the level of sensory inputs, and it is a theorem that information can only decrease along a processing chain. So if we say that the goal of a code is only to represent the maximum amount of information about stimuli, then what is gained by having a second (central) code, which can only be a degraded version of the initial sensory inputs? Thinking in this way is in fact committing the homunculus fallacy: looking at the neural responses as a projection of sensory inputs, which “the brain” observes. This projection achieves nothing, for it still leaves unexplained how the brain makes sense of sensory inputs – nothing has been gained in terms of what these inputs mean. At some point there needs to be something else than just representing sensory inputs in a high-dimensional space.

The answer, of course, is that the goal of a “neural code” is not just to represent information, but to do it in such a way that makes it easier to process relevant information. This is the answer provided by representational theories (e.g. David Marr). Then you might also argue that the very notion of a neural code is misleading because the role of a perceptual system is not to encode sensory inputs but to guide behavior, and therefore it is more appropriate to speak of computation rather than code. In either view, the relevant question when interpreting neural responses is not how the rest of brain can make use of it, but rather how they participate in solving the perceptual problem. I believe one key aspect is behavioral invariance, for example the fact that you can localize a sound source independently of its level (within a certain range). Another key aspect is that the “code” is in some way easier to decode for “neural observers” (not just any observer).