Why does a constant stimulus feel constant? (I)

What is the relationship between neural activity and perception? That is, how does the quality of experience (qualia) relate with neural activity? For any scientist working in sensory neuroscience, this should be a central question – perhaps THE question. Unfortunately the obsession of the community for the question of “neural coding”, which is about relating neural activity with externally defined properties of sensory stimuli, does not help much in this regard. In fact, very naïve philosophical assumptions about this question seem to pervade the field. A popular one is that the perceived intensity of a stimulus corresponds to the firing rate of the neurons that are sensitive to it, in particular sensory receptor neurons. This was indeed an idea proposed by Lord Adrian in the 1920s, and the basic argument is that the firing rate of sensory neurons generally increases when the strength of the stimulus is increased. Clearly the argument is very weak (only an observed correlation), but I will try to refute it explicitly, because it triggers some interesting remarks. To refute it, I will turn the perspective around: why is it that a constant stimulus feels constant? In fact, what is a constant stimulus? This is a very basic question about qualia, but it turns out that it is a surprisingly deep one.

Let us start by listing a few sensory stimuli that feel constant, in terms of perceptual experience. A pure tone (e.g. the sound produced by a diapason or a phone) feels constant. In particular its intensity and pitch seem to be constant. Another example could be a clear blue sky, or any object that you fixate. In the tactile modality, a constant pressure on a finger. For these stimuli, there is a perceived constancy in their qualities, ie, the color of the sky does not seem to change, the frequency of the tone does not seem to change. Attention to the stimulus might fade, but one does not perceive that the stimulus changes. In contrast, neural activity is not constant at all. For a start, neurons fire spikes, and that means that their membrane potential always changes, but we do not feel this change. Secondly, in sensory receptor neurons but also in most sensory neurons in general, the frequency of those spikes changes in response to a constant stimulus: it tends to decrease (“adapt”). But again, a blue sky still feels the same blue (and not darker) and the pitch and intensity of a pure tone do not decrease. There appears to be no such simple connection between neural activity and the perceived intensity of a stimulus. Why is it that the intensity of a pure tone feels constant when the firing rate of every auditory nerve fiber decreases?

In response to this question, one might be tempted to propose a homunculus-type argument: the brain analyzes the information in the responses of the sensory neurons and “reconstructs” the true stimulus, which is constant. In other words, it feels constant because the brain represents the outside world and so it can observe that the stimulus is constant. As I noted a number of times in this blog, there is a big conceptual problem with this kind of argument (which is vastly used in “neural coding” approaches), and that is circular logic: since the output of the reconstruction process is in the external world (the stimulus), how can the brain know what that output might be, as it is precisely the aim of the reconstruction process to discover it? But in fact in this case, the fallacy of this argument is particularly obvious, for two reasons: 1) whereever the representation is supposed to be in the brain, neural activity is still not constant (in particular, made of spikes); 2) even more importantly, in fact what I called “constant stimulus” is physically not constant at all.

Physically, a pure tone is certainly not constant: air vibrates at a relatively high rate, the acoustic pressure at the ear fluctuates. The firing of auditory nerve fibers actually follows those fluctuations, at least if the frequency is low enough (a phenomenon called phase locking), but it certainly doesn't feel this way. Visual stimuli are also never constant, because of eye movements – even when one fixates an object (these are called fixational eye movements). In fact, if the retinal image is stabilized, visual perception fades away quickly. In summary: the sensory stimulus is not constant, neurons adapt, and generally neural activity is dynamic for any stimulus. So the question one should ask is not: how does the brain know that the stimulus is constant, but rather: what is it that make those dynamic stimuli feel perceptually constant?

The challenge of retrograde amnesia to theories of memory

I am reading Oliver Sacks' “The man who mistook his wife for a hat”. On chapter 2, he describes a case of retrograde amnesia. Around 1970, the patient went into an episode of alcoholism, which resulted in the loss of 25 years of his most recent memories (declarative memory). As a result, the patient thought he was 19 and lived in 1945, as if time had stopped. So not only could he not transfer short-term memory to long-term memory, but a large part of his previously stored memory got erased. In addition, it is not a random fraction of his memories that were erased: it was exactly the most recent ones. He seemed to perfectly remember old memories, and have absolutely no memory of the more recent events.

This is quite a challenge for current neural theories of memory. The main theoretical concept about memory is the notion of neural assembly supporting associative memory. Imagine a memory is made of a number of elements that are associated together, then the substrate of this memory is a connected network of neurons that “code” for those elements, in some structure in the brain, with connections to relevant parts of the brain (say, the visual cortex for visual features, etc). This conceptual framework can be extended with sequential activation of neurons. Now in this framework, how do you erase the most recent memories? Note that by “most recent”, I am talking of 25 years, not of short-term memory.

One trivial possibility would be that each memory has a timestamp, encoded as part of the neural assembly supporting that memory. Then some mechanism erases all memories that have a timestamp more recent than a particular date. Why and how this would happen is mysterious. In addition, “reading the timestamp” would entail activating those memories (all of them), which would then need to exist at that time, and then erasing them. It simply sounds absurd. A more plausible explanation is that, for some reason, recent memories are more fragile than old ones. But why is that?

This is a very interesting point, because in current neural theories of memory, it is the old memories that are more fragile than the recent ones. The reason is that memories are imprinted by modifications of synaptic connections according to a Hebbian mechanism (neurons that are co-activated strengthen their connections), and then these connections get degraded over time because of the activation of the same neurons in other contexts, by ongoing activity. So in current theories of memory, memory traces decay over time. But what retrograde amnesia implies is exactly the opposite: memory traces should strengthen over time. How is it possible that memories strengthen over time?

One possibility is that memories are replayed. If you recall a memory, the neurons supporting that memory activate and so the corresponding connections strengthen. But conscious recollection will probably not do the trick, because then there would not be a strict temporal cut-off: i.e., some recent memories might be recalled more often than some older ones. So what seems to be necessary is a continuous subconscious replay of memories, independent of emotional or attentional states. Clearly, this is quite a departure from current neural theories of memory.

Rate vs. timing (XXI) Rate coding in motor control

Motor control is sometimes presented as the prototypical example of rate coding. That is, muscle contraction is determined by the firing rate of motoneurons, so ultimately the “output” of the nervous system follows a rate code. This is a very interesting example, precisely because it is actually not an example of coding, which I previously argued is a problematic concept.

I will briefly recapitulate what “neural coding” means and why it is a problematic concept. “Coding” means presenting some property of things in the world (the orientation of a bar, or an image) in another form (spikes, rates). That a neuron “codes” for something means nothing more than its activity co-varies with that thing. For example, pupillary diameter encodes the amount of light captured by the retina (because of the pupillary contraction reflex). Or blood flow in the primary visual cortex encodes local visual orientation (this is what is actually measured by intrinsic optical imaging). So coding is really about observations made by an external observer, it does not tell much about how the system works. It is a common source of confusion because when one speaks of neural coding, there is generally the implicit assumption that the nervous system “decodes it” somehow. But presumably the brain does not “read-out” blood flow to infer local visual orientation. The coding perspective leaves the interesting part (what is the “representation” for?) largely unspecified, which is the essence of the homunculus fallacy.

The control of muscles by motoneurons does not fit this framework, because each spike produced by a motoneuron has a causal impact on muscle contraction: its activity does not simply co-vary with muscle contraction, it causes it. So first of all, motor control is not an example of rate coding because it is not really an example of coding. But still, we might consider that it conforms to rate-based theories of neural computation. I examine this statement now.

I will now summarize a few facts about muscle control by motoneurons, which are found in neuroscience textbooks. First of all, a motoneuron controls a number of muscle fibers and one fiber is contacted by a single motoneuron (I will only discuss α motoneurons here). There is indeed a clear correlation between muscle force and firing rate of the motoneurons. In fact, each single action potential produces a “muscle twitch”, i.e., the force increases for some time. There is also some amount of temporal summation, in the same way as temporal summation of postsynaptic potentials, so there is a direct relationship between the number of spikes produced by the motoneurons and muscle force.

Up to this point, it seems fair to say that firing rate is what determines muscle force. But what do we mean by that exactly? If we look at muscle tension as a function of a time, resulting from a spike train produced by a motoneuron, what we see is a time-varying function that is determined by the timing of every spike. The rate-based view would be that the precise timing of spikes does not make a significant difference to that function. But it does make a difference, although perhaps small: for example, the variability of muscle tension is not the same if the spike train is regular (small variability) or if it is random, e.g. Poisson (larger variability). Now this gets interesting: during stationary muscle contraction (no movement), those motoneurons generate constant muscle tension and they fire regularly, unlike cortical neurons (for example). Two remarks: 1) this does not at all conform to standard rate-based theory where rate is the intensity of a Poisson process (little stochasticity); 2) regularly firing is exactly what motoneurons should be doing to minimize variability in muscle tension. This latter remark is particularly significant. It means that, beyond the average firing rate, spikes occur at a precise timing that minimizes tension variability, and so spikes do matter. Thus motor control rather seems to support spike-based theories.

Subjective physics

I just finished writing a text about "subjective physics": a term I made up to designate the description of the laws that govern sensory signals and their relationships with actions. It is relevant to systems computational neuroscience, embodiment theories and psychological theories of perception (in particular Gibson's ecological theory and the sensorimotor theory). Here is the abstract:

Imagine a naive organism who does not know anything about the world. It can capture signals through its sensors and it can make actions. What kind of knowledge about the world is accessible to the organism? This situation is analog to that of a physicist trying to understand the world through observations and experiments. In the same way as physics describes the laws of the world obtained in this way by the scientist, I propose to name subjective physics the description of the laws that govern sensory signals and their relationships with actions, as observed from the perspective of the perceptual system of the organism. In this text, I present the main concepts of subjective physics, illustrated with concrete examples.

What is computational neuroscience? (XIX) Does the brain process information?

A general phrase that one reads very often about the brain in the context of perception is that it “processes information”. I have already discussed the term “information”, which is ambiguous and misleading. But here I want to discuss the term “process”. Is it true that the brain is in the business of “information processing”?

“Processing” refers to a procedure that takes something and turns it into something else by a sequence of operations, for example trees into paper. So the sentence implies that what the brain is doing is transforming things into other things. For example, it transforms the image of a face into the identity of the face. The coding paradigm, and more generally the information-processing paradigm, relies on this view.

I will take a concrete example. Animals can localize sounds, based on some auditory cues such as the level difference between the two ears. In the information processing view, what sound localization means is a process that takes a pair of acoustic signals and turns it into a value representing the direction of the sound source. However, this not literally what an animal does.

Let us take a cat. The cat lives and, most of the time, does nothing. Through its ears, it receives a continuous acoustic flow. This flow is transduced into electrical currents, which triggers some activity in the brain, that is, electrical events happening. At some moment in time, a mouse scratches the ground for a second, and the cat turns its eyes towards the source, or perhaps crawls to the mouse. During an extended period of time, the mouse is there in the world, and its location exists as a stable property. What the cat “produces”, on the other hand, is a discrete movement with properties that one can relate to the location of the mouse. Thus, sound localization behavior is characterized by discrete events that occur in a continuous sensory flow. Behavior is not adequately described as a transformation of things into things, because behavior is an event, not a thing: it happens.

The same remark applies to neurons. While a neuron is a thing that exists, a spike is an event that happens. It is a transient change in electrical properties that triggers changes in other neurons. As the terms “neural activity” clearly suggest, a spike is not a “thing” but an event, an action on other neurons or muscles. But the notion of information processing implies that neural activity is actually the end result of a process rather than the process itself. There is a confusion between things and events. In a plant that turns trees into paper, trees and papers are the things that are transformed; the action of cutting trees is not one of these things that are transformed. Yet this is what the information processing metaphor says about neural activity.

There are important practical implications for neural models. Traditionally, these models follow the information-processing paradigm. There is an input to the model, for example a pair of acoustical signals, and there is an output, for example an estimate of sound location (I have worked on this kind model myself, see e.g. Goodman & Brette, PLoS Comp Biol 2010). The estimate is generally calculated from the activity of the neurons over the course of the simulation, which corresponds to the time of the sound. For example, one could select the neuron with the maximum firing rate and map its index to location; or one could compute estimate based on population averages, etc. In any case, there is a well-defined input corresponding to a single sound event, and a single output value corresponding to the estimated location.

Now try to embed this kind of model into a more realistic scenario. There is a continuous acoustic flow. Sounds are presented at various locations in sequence, with silent gaps between them. The model must estimate the locations of these sounds. We have a first problem, which is that the model produces estimates based on total activity over time, and this is clearly not going to work here since there is a sequence of sounds. The model could either produce a continuous estimate of source location (the equivalent of continuously pointing to the source), or it could produce an estimate of source location at specific times (the equivalent of making a discrete movement to the source), for example when the sounds stop. In either case, what is the basis for the estimate, since it cannot be the total activity any more? If it is a continuous estimate, how can it be a stable value if neurons have transient activities? More generally, how can the continuous flow of neural activity produce a discrete movement to a target position?

Thus, sound localization behavior is more than a mapping between pairs of signals and direction estimates. Describing perception as “information processing” entails the following steps: a particular time interval of sensory flow is selected and considered as a thing (rather than a flow of events); a particular set of movements is considered and some of its properties are extracted (e.g. direction); what the brain does is described as the transformation of the first thing into the second thing. Thus, it is an abstract construction by an external observer.

Let me summarize this post and the previous one. What is wrong about “information processing”? Two things are wrong. First (previous post), the view that perception is the transformation of information of some kind into information of another kind is self-contradictory, because a signal can only be considered “information” with respect to a perceptual system. This view of perception therefore proposes that there are things to be perceived by something else than the perceptual system. Second (this post), “processing” is the wrong term because actions produced by the brain are not things but events: it is true at the scale of the organism (behavior) and it is true at the scale of neurons (spikes). Both behavior and causes of behavior are constituted by events, not things. It is also true of the mind (phenomenal consciousness). A thing can be transformed into another thing; an event happens.

What is computational neuroscience? (XVIII) Representational approaches in computational neuroscience

Computational neuroscience is the science of how the brain “computes”: how it recognizes faces or identifies words in speech. In computational neuroscience, standard approaches to perception are representational: they describe how neural networks represent in their firing some aspect of the external world. This means that a particular pattern of activity is associated to a particular face. But who makes this association? In the representational approach, it is the external observer. The approach only describes a mapping between patterns of pixels (say) and patterns of neural activity. The key step, of relating the pattern of neural activity to a particular face (which is in the world, not in the brain), is done by the external observer. How then is this about perception?

This is an intrinsic weakness of the concept of a “representation”: a representation is something (a painting, etc) that has a meaning for some observer, it is not about how this meaning is formed. Ultimately, it does not say much about perception, because it simply replaces the problem of how patterns of photoreceptor activity lead to perception by the problem of how patterns of neural activity lead to perception.

A simple example is the neural representation of auditory space. There are neurons in the auditory brainstem whose firing is sensitive to the direction of a sound source. One theory proposes that the sound's direction is signaled by the identity of the most active neuron (the one that is “tuned” to that direction). Another one proposes that it is the total firing rate of the population, which covaries with direction, that indicates sound direction. Some other theory considers that sound direction is computed as a “population vector”: each neuron codes for direction, and is associated a vector oriented in that direction, with a magnitude equal to its firing rate; the population vector is sum of all vectors.

Implicit in these representational theories is the idea that some other part of the brain “decodes” the neural representation into sound's direction, which ultimately leads to perception and behavior. However, this part is left unspecified in the model: neural models stop at the representational level, and the decoding is done by the external observer (using some formula). But the postulate of a subsequent neural decoder is problematic. Let us assume there is one. It takes the “neural representation” and transforms it into the target quantity, which is sound direction. But the output of a neuron is not a direction, it is a firing pattern or rate that can perhaps be interpreted as a direction. So how is sound direction represented in the output of the neural decoder? It appears that the decoder faces the same conceptual problem, which is that the relationship between output neural activity and the actual quantity in the world (sound direction) has to be interpreted by the external observer. In other words, the output is still a representation. The representational approach leads to an infinite regress.

Since neurons are in the brain and things (sound sources) are in the world, the only way to avoid an external “decoding” stage that relates the two is to include both the world and the brain in the perceptual model. In the example above, this means that, to understand how neurons estimate the direction of a sound source, one would not look for the “neural representation” of sound sources but for neural mechanisms that, embedded in an environment, lead to some appropriate orienting behavior. In other words, neural models of perception are not complete without an interaction with the world (i.e., without action). In this new framework, “neural representations” become a minor issue, one for the external observer looking at neurons.

Neural coding and the invariance problem

In sensory systems, one of the hardest computational problems is the “invariance problem”: the same perceptual category can be associated with a large diversity of sensory signals. A classical example is the problem of a recognizing a face: the same face can appear with different orientations relative to the observer, and under different lighting conditions, and it is a challenge to design a recognition system that is invariant to these sources of variation.

In computational neuroscience, the problem is usually framed within the paradigm of statistical learning theory as follows. Perceptual categories belong to some set Y (the set of faces). Sensory signals belong to some high-dimensional sensory space X (e.g. pixels). Each particular category (a particular face) corresponds to a specific set of signals in X (different views of the face) or to a distribution on X. The goal is to find the correct mapping from X to Y from particular labeled examples (a particular view x of a face, the name y corresponding to that face). This is also the view that underlies the “neural coding” paradigm, where there is a communication channel between Y and X, and X contains “information” about Y.

Framed in this way, this is a really difficult problem in general, and it requires many examples to form categories. However, there is a different way of approaching the problem, which follows from the concept of “invariant structure” developed by James Gibson. It starts with the observation that a sensory system does not receive a static input (an image) but rather a sensory flow. This is obvious in hearing (sounds are carried by acoustic waves, which vary in time), but it is also true of vision: the eyes are constantly moving even when fixating an object (e.g. high-frequency tremors). A perceptual system is looking for things that do not vary within this sensory flow, the “invariant structure”, because this is what defines the essence of the world.

I will develop the example of sound localization. When a source produces a sound, there are time-varying acoustical waves propagating in the air, and possibly reaching the ears of a listener. The input to the auditory system is two time-varying signals. Through the sensory flow, the identity and spatial location of the source are unchanged. Therefore, any piece of information about these two things must be found in properties of the auditory signals that are invariant through the sensory flow. For example, if we neglect sound diffraction, the fact that one signal is a delayed copy of the other, with a particular delay, is true as long as the sound exists. An invariant property of the acoustic signals is not necessary about the location of the sound source. It could be about the identity of the source for the example (the speaker). However, if that property is no longer invariant when movements are produced by the organism, then that property cannot be an intrinsic property of the source, but rather something about the relative location of the sound source.

In this framework, the computational problem of sound localization is in two stages: 1) for any single example, pick-up an acoustical invariant that is affected by head movements, 2) associate these acoustical invariants with sound location (either externally labeled, or defined with head movements). The second stage is essentially the computational problem defined in the neural coding/statistical learning framework. But the first stage is entirely different. It is about finding an invariant property within a single example, and this only makes sense if there is a sensory flow, i.e., if time is involved within a single example and not just across examples.

There is a great benefit in this approach, which is to solve part of the invariance problem from the beginning, before any category is assigned to an example. For example, a property about the binaural structure produced by a broadband sound source at a given position will also be true for another sound source at the same position. In this case, the invariance problem has disappeared entirely.

Within this new paradigm, the learning problem is now: given a set X of time-varying sensory signals produced by sound sources, how to find a mapping from X to some other space Y such that the images of sensory signals through this mapping do not vary over time, but vary across sources? Phrased in this way, this is essentially the goal of slow feature analysis. However the slow feature algorithm is a machine learning technique, whose biological instantiation is not straightforward.

There have been similar ideas in the field. In a highly cited paper, Peter Földiak proposed a very simple unsupervised Hebbian rule based on related considerations (Földiak, Neural Comp 1991). The study focused on the development of complex cells in the visual system, which respond to edges independently of their location. The complex cell combines inputs from simple cells, which respond to specific edges, and the neuron must learn the right combination. The invariance is learned by presenting moving edges, that is, it is looked for within the sensory flow and not across independent examples. The rule is very simple: it is a Hebbian rule in a rate-based model, where the instantaneous postsynaptic activity is replaced by a moving average. The idea is simply that, if the output must be temporally stable, then the presynaptic activity should be paired with the output at any time. Another paper by Schraudolph and Sejnowski (NIPS 1992) is actually about finding the “invariant structure” (with no mention of Gibson) using an anti-Hebbian rule, but this means that neurons signal the invariant structure by not firing, which is not what neurons in the MSO seems to be doing (although perhaps the idea might be applicable to the LSO).

There is a more recent paper in which slow feature analysis is formally related to Hebbian rules and to STDP (Sprekeler et al., PLoS CB 2007). Essentially, the argument is that minimizing the temporal variation of the output is equivalent to maximizing the variance of the low-pass filtered output. In other words, they provide a link between slow feature analysis and Földiak’s simple algorithm. There are also constraints, in particular the synaptic weights must be normalized. Intuitively this is obvious: to aim for a slowly varying input is the same thing as to aim for increasing the low frequency power of the signal. The angle in the paper is rather on rate models but it gives a simple rationale for designing learning rules that promote slowness. In fact, it appears that the selection of slow features follows from the combination of three homeostatic principles: maintaining a target mean potential and a target variance, and minimizing the temporal variation of the potential (through maximizing the variance of the low-pass filtered signal). The potential may be replaced by the calcium trace of spike trains, for example.

It is relatively straightforward to see how this might be applied for learning to decode the activity of ITD-sensitive neurons in the MSO into the location of the sound source. For example, a target neuron combines inputs from the MSO into the membrane potential, and the slowness principle is applied to either the membrane potential or the output spike train. As a result, we expect the membrane potential and the firing rate of this neuron to depend only on sound location. These neurons could be in the inferior colliculus, for example.

But can this principle be also applied to the MSO? In fact, the output of a single neuron in the MSO does not depend only on sound location, even for those neurons with a frequency-dependent best delay. Their output also depends on sound frequency, for example. But is it possible that their output is as slow as possible, given the constraints? It might be so, but another possibility is that only some property of the entire population is slow, and not the activity of individual neurons. For example, in the Jeffress model, only the identity of the maximally active neuron is invariant. But then we face a difficult question: what learning criterion should be applied at the level of an individual neuron so that there is a slow combination of the activities of all neurons?

I can imagine two principles. One is a backpropagation principle: the value of the criterion in the target neurons, i.e., slowness, is backpropagated to the MSO neurons and acts as a reward. The second is that the slowness criterion is applied at the cellular level not to the output of the cell, but to a signal representing a combined activity of neurons in the MSO, for example the activity of neighboring neurons.

What does it mean to represent a relation?

In this blog, I have argued many times that if there are neural representations, these must be about relations. For example, a relation between two sensory signals, or about a potential action and the effect on the sensory signals. But what does it mean exactly that something (say neural activity) “represents” a relation? It turns out that the answer is not so obvious.

The classical way to understand it is to consider that a representation is something (an event, a number of spikes, etc) that stands for the thing being represented. That is, there is a mapping between the thing being represented and the thing that represents it. For example, in the Jeffress model of sound localization, the identity of the most active binaural neuron stands for the location of the sound, or in terms of relation, for the fact that the right acoustical signal is a delayed copy of the left acoustical signal, with a specific delay. The difficulty here is that a representation always involves three elements: 1) the thing to be represented, 2) the thing that represents it, 3) the mapping between the first two things. But in the classical representational view, we are left with only the second element. In what sense does the firing of a binaural neuron tells us that there is such a specific relation between the monaural signals? Well it doesn’t, unless we already know in advance that this is what the firing of that neuron stands for. But from observing the firing of the binaural neurons, there is no way we can ever know that: we just see neurons lighting up sometimes.

There are different ways to address this issue. The simplest one is simply to say: it doesn’t matter. The activity of the binaural neurons represents a relationship between the monaural neurons, at least for us external observers, but the organism doesn’t care: what matters is that their activity can be related to the location of the sound source, defined for example as the movement of the eyes that put the sound source in the fovea. In operational terms, the organism must be able to take an action conditionally to the validity of a given relation, but what this relation exactly is in terms of the acoustical signals doesn’t matter.

An important remark is in order here. There is a difference between representing a relation and representing a quantity (or vector), even in this simple notion of representation. A relation is a statement that may be true or not. This is different from a quantity resulting from an operation. For example, one may always calculate the peak lag in the cross-correlation function between two acoustical signals, and call this “the ITD” (interaural time difference). But such a number is obtained whether there is a source, several sources or no source at all. Thus, this is not the same as relations of the form: the right signal equals the left signal delayed by 500 µs. Therefore, we are not speaking of a mapping between acoustical signals and action, which would be unconditional, but of actions conditional to a relation in the acoustical signals.

Now there is another way to understand the phrase “representing a relation”, which is in a predictive way: if there is a relation between A and B, then representing the relation means that from A and the relation, it is possible to predict B. For example: saying that the right signal is a delayed copy of the left signal, with delay 500 µs, means that if I know that the relation is true and I have the left signal, then I can predict the right signal. In the Jeffress model, or in fact in any model that represents the relation in the previous sense, it is possible to infer the right signal from the left signal and the representation, but only if the meaning of that representation is known, i.e., if it is known that a given neuron firing stands for “B comes 500 µs after A”. This is an important distinction with the previous notion of representation, where the meaning of the relation in terms of acoustics was irrelevant.

We now have a substantial problem: where does the meaning of the representation come from? The firing of binaural neurons in itself does not tell us anything about how to reconstruct signals. To see the problem more clearly, imagine that the binaural neurons develop by selecting axons from both sides. In the end there is a set of binaural neurons whose firing stands for binaural relations with different ITDs. But by just looking at the activity of the binaural neurons after development, or at the activity of both the binaural neurons and the left monaural acoustical signal, it is impossible to know what the ITD is, or what the right acoustical signal is at any time. To be able to do this, one actually needs to have learned the meaning of the representation carried by the binaural neurons, and this learning seems to require both monaural inputs.

It now seems that this second notion of representation is not very useful, since in any case it requires all terms of the relation. This brings us to the notion that to represent a relation, and not just a specific instantiation of it (i.e., these particular signals have such property), it must be represented in a sense that may apply to any instantiation. For example, if I know that a source at a given location, I can imagine for any left signal what should be the right signal. Or, given a signal on the left, I can imagine what should be the right signal if the source were at a given location.

I’m ending this post with probably more confusion than when I started. This is partly intended. I want to stress here that once we start thinking of perceptual representations in terms of relations, then classical notions of neural representations quickly seem problematic or at least insufficient.

Information about what?

In a previous post, I pointed out that the word “information” is almost always used in neuroscience in the sense of information theory, and this is a very restricted notion of information that leads to dualism in many situations. There is another way to look at this issue, which is to ask the question: information about what?

In discussions of neural information or “codes”, there are always three elements involved: 1) what carries the information, e.g. neural electrical activity, 2) what the information is about (e.g. the orientation of a bar), 3) the correspondence between the first two elements. If dualism is rejected, then all the information an organism ever gets from the world must come from its own senses (and the effect of actions on them). Therefore, if one speaks of information for the organism, as opposed to information for an external observer, then the key point to consider is that the information should be about something intrinsic to the organism. For example, it should not be about an abstract parameter of an experimental protocol.

So what kind of information are we left with? For example, there may be information in one sensory signal about another sensory signal, in the sense that one can be (partially) predicted from the other. Or there can be information in a sensory signal about the future signal. This is equivalent to saying that the signals follow some law, a theme developed for example by Gibson (invariant structure) and O’Regan (sensorimotor contingency).

One might think that this conception of information would imply that we can’t know much about the world. But this is not true at all, because there is knowledge about the world coming from the interaction of the organism with the world. Consider space for example. A century ago, Poincaré noted that space, with its topology and structure, can be entirely defined by the effect of our own movements on our senses. To simplify, assume that the head and eyes are fixed in our body and we can move only by translations, although with possibly complex movements. We can go from point A to point B through some movements. Points A and B differ by the visual inputs. Movements act on the set of points (=visual inputs) as a group action that has the structure of a two-dimensional Euclidian space (for example, for each movement there is an opposite movement that takes you back to the previous point, combinations of movements are commutative, etc). This defines space as a two-dimensional affine space. In fact, Poincaré (and also Einstein) went further and noted that space as we know it is necessarily defined with respect to an observer. For example, we can define absolute Cartesian coordinates for space, but the reference point is arbitrary, only relative statements are actually meaningful.

In summary, it is not so much that the concept of information or code is completely irrelevant in itself. The issues arise when one speaks of codes about something external to the organism. In the end, this is nothing else than a modern version of dualism (as Dennett pointed out with his “Cartesian theater”). Rejecting dualism implies that any information relevant to an organism must be about something that the organism can do or observe, not about what an external observer can define.

Complex representations and representations of complex things

In a previous post, I noted that the concept of neural assembly is limited by the fact that it does not represent relations. But this means that it is not possible to represent in this way a complex thing such as a car or a face. This might seem odd since many authors claim that there are neurons or groups of neurons that code for faces (in IT). I believe there might be again some confusion between representation and information in Shannon’s sense. What is meant when it is stated that an assembly of neurons codes for a face is that its activity stands for the presence of a face in the visual field. So in this sense the complex thing, the face, is represented, but the representation itself is not complex. With such a concept of representation, complex things can only be represented by removing all complexity.

This is related to the problem of invariant representations. How is it that we can recognize a face under different viewpoints, lightning conditions, possibly changes in hair style and facial expressions? One answer is that there must be a representation that is invariant, i.e., a neural assembly that codes for the concept “Paul’s face” independently of the specific way it can appear. However, this is an incomplete answer, for when I see Paul’s face, I can recognize that it’s Paul, but I can also see that he smiles, that I’m looking at him from the side, that he tainted his hair in black. It’s not that by some process I have managed to remove all details that are not constituent of the identity of Paul’s face, but rather I am seeing everything that makes Paul’s face, both in the way it usually appears and in the specific way it appears this time. So the fact that we can recognize a complex thing in an invariant way does not mean that the complexity itself is discarded. In reality we can still register this complexity, and our mental representation of a complex thing is indeed complex. As I argued before, the concept of neural assembly is too crude to capture such complexity.

The concept of invariance is even more interesting when applied to categories of objects, for example a chair. In contrast with Paul’s face, different chairs are not just different viewpoints on the same physical object, they really are different physical objects. They can have different colors, widely different shapes and materials. They usually have four legs, but surely we would recognize a three-legged chair as such. What really makes a chair is that one can sit on it, have her back in contact with it. This is related to Gibson’s concept of “affordances”. Gibson argued that we perceive the affordances of things, i.e., the possibilities of interaction with things.

So now I could imagine that there is an assembly of neurons that codes for the category “chair”. This is fine, but this is only something that stands for the category, it does not describe what this category is. It is not the representation of an affordance. Representing it would involve representing the potential action that one could make with that object. I do not know what kind of neural representation would be adequate, but it would certainly be more complex (i.e., structured) than a neural assembly.