What is computational neuroscience? (X) Reverse engineering the brain

One phrase that occasionally pops up when speaking of the goal of computational neuroscience is “reverse engineering the brain”. This is quite an interesting phrase from an epistemological point of view. The analogy is to see the brain as an engineered device, the “engineer” being evolution, of which we do not possess the design plans. We are supposed to understand it by opening it, and trying to guess what mechanisms are at play.

What is interesting is that observing and trying to understand the mechanisms is basically what science is about, not only neuroscience, so there must be something else in this analogy. For example, we would not describe the goal of astronomy as reverse engineering the planets. What is implied in the phrase is the notion that there is a plan, and that this plan is meant to achieve a function. It is a reference to the teleonomic nature of life in general, and of the nervous system in particular: the brain is not just a soup of neurons, these neurons coordinate their action so as to achieve some function (to survive, to reproduce, etc).

So the analogy is meaningful from this point of view, but as any analogy it has its limits. Is there no difference between a living being and an engineered artifact? This question points at what is life, which is a very broad question, but here I will just focus on two differences that I think are relevant for the present matter.

There is one very important specificity that was well explained by the philosopher Humberto Maturana (“The Organization of the living”, 1974). Engineered things have a structure that is designed so as to fulfill some function, that is, they are made of specific components that have to be arranged in a specific way, according to a plan. So all you need to understand is the structure, and its relation with the function. But as Maturana pointed out, living things have a structure (the body, the wiring of neurons, etc) but they also have an organization that produces that structure. The organization is a set of processes that produce the structure, which is itself responsible for the organization. But what defines the living being is its organization, not its structure, which can change. In the case of the nervous system, the wiring between neurons changes dramatically in the course of life, or even in the course of one hour, and the living being remains the same. The function of the organization is to maintain the conditions for its existence, and since it exists in a body interacting with an external environment, it is in fact necessary that the structure changes so as to maintain the organization. This is what is usually termed “plasticity” or “learning”. Therefore living things are defined by their organization, while engineered things are defined by their structure.

This is one aspect in which the engineering analogy is weak, because it misses this important distinction. Another one is that an engineered thing is made by an engineer, that is, by someone external to the object. Therefore the function is defined with respect to an external point of view. The plan would typically include elements that are defined in terms of physics, concepts that can only be grasped and measured by some external observer with appropriate tools. But a living organism only has its own senses and ways of interacting with the environment to make sense of the world. This is true of the nervous system as a whole, but also of individual cells: a cell has ways of interacting with other cells and possibly with the outside world, but it does not have a global picture of the organism. For example, an engineer plan would specify where each component should go, e.g. with Euclidian coordinates. But this is not how development can work in a living thing. Instead, the plan should come in the form of mechanisms that specify not “where” a thing is, but rather “how to get there”, or perhaps even when a component should transform into a new component – specific ways of interacting that end up in the desired result.

Therefore the nature of the “plan” is really quite different from the plan of an engineer. To make my point, I will draw an analogy with philosophy of knowledge. A plan is a form of knowledge, or at least it includes some knowledge. For example, if the plan includes the statement “part A should be placed at such coordinates”, then there is an implicit knowledge on part of the organism that executes the plan about Euclidian geometry. For an engineer, knowledge comes from physics, and is based on the use of specific tools to measure things in the world. But for a cell, knowledge about the world comes just from the interaction with the world: different ways to sense it (e.g. incoming spikes for a neuron), different ways to act on it (e.g. producing a spike, releasing some molecules in the extracellular medium). A plan can be specified in terms of physics if it is to be executed by an engineer, but it cannot be specified in these terms if it is to be executed by a cell: instead, it would be specified in terms of mechanisms that make sense given the ways the cell can interact with the world. Implicit knowledge about the world that is included in an engineer plan is what I could call “metaphysical knowledge”, in relationship with the corresponding notion in philosophy of science.

Science is made of universal statements, such as the law of gravitation. But not all statements are scientific, for example “there is a God”. In philosophy of science, Karl Popper proposed that a scientific statement is one that can potentially be falsified by an observation, whereas a metaphysical statement is a statement that cannot be falsified. For example, the statement “all penguins are black” is scientific, because I could imagine that one day I see a white penguin. On the other hand, the statement “there is a God” is metaphysical, because there is no way I can check. Closer to the matter of this text, the statement “the world is actually five-dimensional but we live in a three-dimensional subspace” is also metaphysical because independently of whether it is true or not, we have no way to confirm it or to falsify it given the way we interact with the world.

So what I call “metaphysical knowledge” in an engineer plan is knowledge that cannot be corroborated or falsified by the organism that executes the plan, given its senses and possibilities for action. For example, consider the following statement: neurons in the lateral geniculate nucleus project to the occipital region of the brain. This includes metaphysical knowledge about where that region is, which is specified from the point of view of an external observer. This cannot be a biological plan. Instead, a biological plan would rather have to specify what kind of interaction a growing axon should have with its environment in order to end up in the desired region.

In summary, although the phrase “reverse engineering” acknowledges the fact that, contrary to physical things of nature such as planets, living things have a function, it misses several important specificities of life. One is that living things are defined by their organization, rather than by the changing structure that the organization produces, while engineered things are defined by their structure. Another one is that the “plan”, which defines that organization, is of a very different nature than the plan made by and for an engineer, because in the latter case the function and the design are conceived from an external point of view, which generally includes “metaphysical knowledge”, i.e., knowledge that cannot be grasped from the perspective of the organism.

Rate vs. timing (XIX) Spike timing precision and sparse coding

Spike-based theories are sometimes discarded on the basis that spike timing is not reproducible in vivo, in response to the same stimulus. I already argued that, in addition to the fact that this is a controversial statement (because for example this could be due to a lack of control of independent variables such as attentional state), this is not a case for rate-based theories but for stochastic theories.

But I think it also reveals a misunderstanding of the nature of spike-based theories, because in fact, even deterministic spike-based theories may predict irreproducible spike timing. Underlying the argument of noise is the assumption that spikes are produced by applying some operation on the stimulus and then producing the spikes (with some decision threshold). If the timing of these spikes is not reproducible between trials, so the argument goes, then there must be noise inserted at some point in the operation. However, spike-based theories, at least some of them, do not fit this picture. Rather, the hypothesis is that spikes produced by different neurons are coordinated so as to produce some function. But then there is no reason why spikes need to be produced at the same time by the same neurons in all trials in order to produce the same global result. What matters is that spikes are precisely coordinated, which means that the firing of one neuron depends on the previous firing of other neurons. So if one neuron misses a spike, for example, then it will affect the firing of other neurons, precisely so as to make the computation more reliable. In other words, it is implied by the hypothesis of precise spike-based coordination that the firing of a spike by a single neuron should impact the firing of all other neurons, which makes individual firing non-reproducible.

The theory of sparse coding is line with this idea. In this theory, it is postulated that the stimulus can be reconstructed from the firing of neurons. That is, each spike contributes a “kernel” to the reconstruction, at the time of the spike, and all such contributions are added together so that the reconstruction is as close as possible to the original stimulus. Note how this principle is in some way the converse of the previously described principle: the spikes are not described as the result of a function applied to the stimulus, but rather the stimulus is described as a function of the spikes. So spike encoding is defined as an inverse problem. This theory has been rather successful in explaining receptive fields in the visual (Olshausen) and auditory (Lewicki) systems. It is also meant to make sense from the point of view of minimizing energy consumption, as it minimizes the number of spikes required to encode the stimulus with a given precision. There are two interesting points here, regarding our present discussion. First, it appears that spikes are coordinated in the way I just described above: if one spike is missed, then the other spikes should be produced so as to compensate for this loss, which means there is a precise spike-based coordination between neurons. Second, the pattern of spikes is seen as a solution to an inverse problem. This implies that if the problem is degenerate, then there are several solutions that equally good in terms of reconstruction error. Imagine for example that two neurons contribute exactly the same kernel to the reconstruction – which is not useless if one considers the fact that firing rate is limited by the refractory period. Then on one given trial, either of these two neurons may spike. From the observer point of view, this represents a lack of reproducibility. However, this lack of reproducibility is precisely due to the fact that there is a precise spike-based coordination between neurons: to minimize the reconstruction error, just one of the two neurons should be active, and the timing should be precise too.

Sparse coding with spikes also implies that reproducibility should depend on the stimulus. That is, a stimulus that is highly redundant such as a sinusoidal grating makes a degenerate inverse problem, leading to lack of reproducibility of spikes, precisely because of the coordination between spikes; a stimulus that is highly informative such a movie of a natural scene should lead to higher reproducibility of spikes. Therefore, in the sparse coding framework, the spike-based coordination hypothesis predicts, contrary to rate-based theories, that spike time reproducibility should depend on the information content of the stimulus – in the sense that a more predictable stimulus leads more irreproducible spiking. But even when spiking is not reproducible, it is still precise.

Rate vs. timing (XVIII) Spiking as analog-digital conversion: the evolutionary argument

Following on the previous post, with the analog-digital analogy often comes the idea that the relation between rates and spikes is that of an analog-digital conversion. Or spikes are seen as an analog-digital conversion from the membrane potential. I believe this comes from the evolutionary argument that it seems that spikes appeared for fast propagation of information on long distances, and not because there is anything special about them in terms of computation. It is quite possible that this was indeed the evolutionary constraint that led to the appearance of action potentials (although this is pure speculation), but even if this is true, the reasoning is wrong: for example, the ability of humans to make tools might have developed because they stood up. Yet standing up does not explain tool-making at all. So standing up allows new possibilities, but these possibilities follow a distinct logic. Spikes might have appeared primarily to transmit information at long distances, but once they are there, they have properties that are used, possibly for other purposes, in new ways. In addition, that they appeared to transmit information and the information was analog does not mean information is now used in the same way. Consider: to transmit information over long distances, one uses Morse code on the telegraph. Do you speak to the telegraph? No, you change the code and use a discrete code that has little connection with the actual sound wave. Finally, even if all this makes sense, it still is not an argument in favor rate-based theories, because rate is an abstract quantity that is derived from spikes. So if we wanted to make the case that spikes are only there to carry a truly analog value, the membrane potential, then it would lead us to discard spikes as a relevant descriptive quantity, and a fortiori to discard rates as well. From a purely informational viewpoint (in the sense of Shannon), spikes produced by a neuron carry less information than its membrane potential, but rate carries even less information, since it is abstracted from spikes.

Rate vs. timing (XVII) Analog vs. digital

It is sometimes stated that rate-based computing is like analog computing while spike-based computing is like digital computing. The analogy comes from the fact, of course, that spikes are discrete whereas rates are continuous. But as any analogy, it has its limits. First of all, spikes are not discrete in the way digital numbers are discrete. In digital computing, the input is a stream of binary digits, coming one after another in a cadenced sequence. The digits are gathered by blocks, say of 16 or 32, to form words that stand for instructions or numbers. Let us examine these two facts with respect to spikes. Spikes do not arrive in a cadenced sequence. Spikes arrive at irregular times, and time is continuous, not digital. What was meant by digital is presumably that there can be a spike or there can be no spike, but there is nothing in between. However, given that there is also a continuous timing associated to the occurrence of a spike, a spike is better described as a timed event rather than as a binary digit. But of course one could decide to divide the time axis into small time bins, and associate a digit 0 when there is no spike and 1 when there is a spike. This is certainly true, but as one performs this process as finely as possible to approximate the real spike train, it appears that there are very few 1s drowned in a sea of 0s. This is what is meant by “event”: information is carried by the occurrence of 1s at specific times rather than by the specific combinations of 0s and 1s, as in digital computing. So in this sense, spike-based computing is not very similar to digital computing.

The second aspect of the analogy is that digits are gathered in words (of say 32 digits), and these words are assigned a meaning in terms of either an instruction or a number. Transposed to spikes, these “words” could be the temporal pattern of spikes of a single neuron, or perhaps more meaningfully a pattern of spikes across neurons, as in synchrony-based schemes, or across neurons and time, as in polychronization. Now there are two ways of understanding the analogy. Either a spike pattern stands for a number, and in this case the analogy is not very interesting, since this is pretty much saying that spikes implement an underlying continuous value, in other words this is the rate-based view of neural computation. Or a spike pattern stands for a symbol. This case is more interesting, and it may apply to some proposed spike-based schemes (like polychronization). It emphasizes the idea that unlike rate-based theories, spike-based theories are not (necessarily) related to usual mathematical notions of calculus (e.g. adding numbers), but possibly to more symbolic manipulations.

However, this does not apply to all spike-based theories. For example, in Sophie Denève’s spike-based theory of inference (which I will describe in a future post), spike-based computation actually implements some form of calculus. But in her theory, analog signals are reconstructed from spikes, in the same way as the membrane potential results from the action of incoming spikes, rather than the other way around as in rate-based theories (i.e., a rate description is postulated, then spikes are randomly produced to implement that description). So in this case the theory describes some form of calculus, but based on timed events.

This brings me to the fact that neurons do not always interact with spikes. For example, in the retina, there are many neurons that do not spike. There are also gap junctions, in which the membrane potentials of several neurons directly interact. There are also ephaptic interactions (through the extracellular field potential). There is also evidence that the shape of action potentials can influence downstream synapses (see a recent review by Dominique Debanne). In these cases, we may speak of analog computation. But this does not bring us closer to rate-based theories. In fact, quite the opposite: rates are abstracted from spikes, and stereotypical spikes are an approximation of what really goes on, which may involve other physical quantities. The point here is that firing rate is not a physical quantity as the membrane potential, for example. It is an abstract variable. In this sense, spike-based theories, because they are based on actual biophysical quantities in neurons, might be closer to what we might call “analog descriptions” of computation than rate-based theories.

Complex representations and representations of complex things

In a previous post, I noted that the concept of neural assembly is limited by the fact that it does not represent relations. But this means that it is not possible to represent in this way a complex thing such as a car or a face. This might seem odd since many authors claim that there are neurons or groups of neurons that code for faces (in IT). I believe there might be again some confusion between representation and information in Shannon’s sense. What is meant when it is stated that an assembly of neurons codes for a face is that its activity stands for the presence of a face in the visual field. So in this sense the complex thing, the face, is represented, but the representation itself is not complex. With such a concept of representation, complex things can only be represented by removing all complexity.

This is related to the problem of invariant representations. How is it that we can recognize a face under different viewpoints, lightning conditions, possibly changes in hair style and facial expressions? One answer is that there must be a representation that is invariant, i.e., a neural assembly that codes for the concept “Paul’s face” independently of the specific way it can appear. However, this is an incomplete answer, for when I see Paul’s face, I can recognize that it’s Paul, but I can also see that he smiles, that I’m looking at him from the side, that he tainted his hair in black. It’s not that by some process I have managed to remove all details that are not constituent of the identity of Paul’s face, but rather I am seeing everything that makes Paul’s face, both in the way it usually appears and in the specific way it appears this time. So the fact that we can recognize a complex thing in an invariant way does not mean that the complexity itself is discarded. In reality we can still register this complexity, and our mental representation of a complex thing is indeed complex. As I argued before, the concept of neural assembly is too crude to capture such complexity.

The concept of invariance is even more interesting when applied to categories of objects, for example a chair. In contrast with Paul’s face, different chairs are not just different viewpoints on the same physical object, they really are different physical objects. They can have different colors, widely different shapes and materials. They usually have four legs, but surely we would recognize a three-legged chair as such. What really makes a chair is that one can sit on it, have her back in contact with it. This is related to Gibson’s concept of “affordances”. Gibson argued that we perceive the affordances of things, i.e., the possibilities of interaction with things.

So now I could imagine that there is an assembly of neurons that codes for the category “chair”. This is fine, but this is only something that stands for the category, it does not describe what this category is. It is not the representation of an affordance. Representing it would involve representing the potential action that one could make with that object. I do not know what kind of neural representation would be adequate, but it would certainly be more complex (i.e., structured) than a neural assembly.

What is computational neuroscience? (IX) The epistemological status of simulations

Computational neuroscience is not only about making theories. A large part of the field is also about simulations of neural models on a computer. In fact, there is little theoretical work in neuroscience that does not involve simulations at some stage. The epistemological status of simulations is quite interesting, and studies about it in philosophy of knowledge are quite recent. There is for example the work of Eric Winsberg, but I believe it mostly addresses questions related to physics. In particular, he starts one of his most cited papers (“Simulations, models and theories”, 2001) by stating: “I will be talking about the use of computers for modeling very complex physical phenomena for which there already exist good, well-understood theories of the processes underlying the phenomena in question”. This is an important distinction, and I will come back to it.

What is interesting about simulations from an epistemological viewpoint is that from a strictly Popperian viewpoint, simulation is useless. Indeed it looks like a sort of experiment, but there is no interaction with the world. It starts from a theory and a set of factual statements, and derives another set of factual statements. It is neither the making of a theory (no universal statement is produced), nor the test of a theory. So why is it that simulation is used so broadly?

In fact there are different types of simulation work. Broadly speaking, we may think of two categories: theory-driven simulations, and data-driven simulations.

I will start with theory-driven simulations. There are in fact two different motivations to use simulations in theoretical work. One is exploratory: simulations are used in the process of making theories, because the models are so complex so that it may be difficult to predict their behavior. This is a general problem with so-called complex systems. Simulations are then used for example to explore the effect of various parameters on the behavior of the model, or to see whether some property can appear given a set of rules, etc. Another motivation is to test a theory. Now this may seem odd since we are not speaking of an empirical test. First of all, this apparent oddity perhaps stems from the myth that theoretical work is essentially about making logical deductions from initial statements. But in reality, especially in biology where models can be very complex, theoretical work almost invariably involves some guess work, approximations, and sometimes vaguely justified intuitions. Therefore, it makes sense to check the validity of these approximations in a number of scenarios. For example, in my paper with Jonathan Platkiewicz about the spike threshold, we derived an equation for the spike threshold from the Hodgkin-Huxley equations. It involved approximations of the sodium current, and we also developed the theory in an isopotential neuron. Therefore in that paper, we checked the theory against the numerical simulation of a complex multicompartmental neuron model, and it was not obvious that it would work.

There is another motivation, which is more specific to computational neuroscience. Theories in this field are about how the interaction of neurons produces behavior, or in other words, about linking physiology, at the neuron level, and function, at the systems or organism level. But to speak of function, one needs an environment. This external element is not part of the neural model, yet it is critical to the relevance of the model. Theories generally do not include explicit models of the environment, or only simplistic versions. For example, in my paper about sound localization with Dan Goodman, we proposed a mechanism by which selective synchrony occurs when a sound is presented at specific locations, leading to a simple spiking model that can accurately estimation the location of a sound source in the presence of realistic diffraction properties. In principle it works perfectly, but of course in a real environment the acoustical signals are unknown, but not arbitrary, they may have a limited spectrum, there may be noise, diffraction properties are also unknown but not arbitrary, there may be ambiguities (e.g. the cones of confusion), etc. For this reason, the model needed to be implemented and its performance tested, which we did with recorded sounds, measured acoustical filters and acoustical noise. Thus it appears that even for theory-driven work, simulation is unavoidable because the theory applies to the interaction with an unknown, complex environment. In fact, ideally, models should be simulated, embodied (in a robot) and allowed to interact with a real (non simulated) environment. Since theories in computational neuroscience claim to link physiology and function, this would be the kind of empirical work required to substantiate such claims.

The other type of simulation work is data-driven. I believe this is usually what is meant by “simulation-based science”. In this kind work, there is little specific theory – that is, only established theories are used, such as cable equation theory. Instead, models are built based on measurements. The simulations are then truly used as a kind of experiment, to observe what might emerge from the complex interaction of neuron models. It is sometimes said that simulations are used to do “virtual experiments” when the actual experiments would be impractical. Another typical use is to test the behavior of a complex model when parameters are varied in a range that is considered plausible.

In physics, such computer simulations are also used, for example to simulate the explosion of a nuclear bomb. But as Winsberg noted, there is a very important epistemological distinction between simulations in physics and in biology: in the former, there is an extremely detailed knowledge of both the laws that govern the underlying processes and of the arrangement of the individual elements in the simulations. Note that even in this case, the value of such simulations is controversial. But in the case of biology and especially neuroscience, the situation is quite different. It is in fact acknowledged by the typical use cases mentioned above.

Consider the statement that a simulation is used to perform a “virtual experiment” when actual experiments are impractical. This seems similar to the simulation of a nuclear explosion. In that case, one is interested in the large scale behavior of the system, and at such a large scale the experiment is difficult to do. But in neuroscience, the situation is exactly the opposite. The experiment with a full organism is actually what is easy to do (or at least feasible), it is a behavioral experiment. So simulations are not used to observe how an animal behaves. They are used to observe the microstructure of the system. But then this means that this microstructure was not known at the time when the model was built, and so these properties that are to be observed are considered as sufficiently constrained by the initial set of measurements to be derived from them.

The second, and generally complementary, use case is to simulate the model while varying a number of parameters so as to find the viable region in which the model produces results consistent with some higher-order measurements (for example, local field potentials). If the parameters are varied, then this means they are actually not known with great certainty. Thus it is clear that biophysical models based on measurements are in fact much less reliable than physical models such as those of nuclear explosions.

One source of uncertainty is the values of parameters in the models, for example channel densities. This is already one great problem. Probably the biggest issue here is not so much the uncertainty about parameters, which is an issue in models of all fields, but the fact the parameters are most likely not independent, i.e., they covary in a given cell or between cells. This lack of independence comes from the fact that the model is of a living thing, and in a living thing all components and processes contribute to the function of the organism, which implies tight relations between them. The study of these relations is a defining part of biology as a field, but if a model does not explicitly include these relations, then it would seem extraordinary that proper function can arise without them, given that they are hidden under the uncertainty in the parameters. For example, consider action potential generation. Sodium channels are responsible for initiation, potassium channels for repolarization. There are a number of recent studies showing that their properties and densities are precisely tuned with respect to each other so that energy consumption is minimized: indeed energy is lost if they are simultaneously open because they have opposite effects. If this functional relation were unknown and only channel densities were known within some range, then the coordination would go unnoticed and a naive model simply using independent values from these distributions would display inefficient action potential generation, unlike real neurons.

I will try to summarize the above point. Such simulations are based on the assumption that the laws that govern the underlying processes are very well understood. This may well be true for the laws of neural electricity (cable equations, Hodgkin-Huxley equations). However, in biology in general and in neuroscience in particular, the relevant laws are also those that describe the relations between the different elements of the model. This is a completely different set of laws. For the example of action potential generation, the laws are related to the co-expression of channels, which is more related to the molecular machinery of the cell than to its electrical properties.

Now these laws, which relate to the molecular and genetic machinery, are certainly not so well known. And yet, they are more relevant to what defines a living thing than those describing the propagation of electrical activity, since indeed these are the laws that maintain the structure that maintain the cells alive. Thus, models based on measurements attempt to reproduce biological function without capturing the logics of the living, and this seems rather hopeful. There are also many examples in recent research that show that the knowledge we have of neural function is rather poor, compared to what is to be found. For example, glial cells (which make most of the cells in the brain) are now thought to play a much more important role in brain function than before, and these are generally ignored in models. Another example is in action potential initiation. Detailed biophysical models are based on morphological reconstructions of the axon, but in fact in the axon initial segment, there is also a scaffold that presumably alters the electrical properties along the axon (for example the axial resistivity should be higher).

All these remarks are meant to point out that in fact, it is illusory to think that there are, or will be in the near future, realistic models of neural networks based on measurements. What is worse, such models seem to miss a critical point in the study of living systems: these are not defined only by their structure (values of parameters, shape of cells) but by processes to maintain that structure and produce function. To quote Maturana (1974), there is a difference between the structure (channel densities etc) and the organization, which is the set of processes that set up that structure, and it is the organization, not the structure, that defines a living thing. Epistemologically speaking, the idea that things not accessible to experiment can be simulated based on measurements that constrain a model is induction. But the predictive power of induction is rather limited when there is such uncertainty.

I do not want to sound as if I were entirely dismissing data-driven simulations. Such simulations can still be useful, as an exploratory tool. For example, one may simulate a neuron using measured channel densities and test whether the results are consistent with what the actual cell does. If they are not, then we know we are missing some important property. But it is wrong to claim that such models are more realistic because they are based on measurements. On one hand, they are based on empirical measurements, on the other hand, they are dismissing mechanisms (or “principles”), which is another empirical aspect to be accounted for in living things. I will come back in a later post to the notion of “realistic model”.

The villainous monster recursion

In O’Regan’s paper about the sensorimotor theory of perception (O’Regan and Noë, BBS 2001), he uses the analogy of the “villainous monster”. I quote it in full:

“Imagine a team of engineers operating a remote-controlled underwater vessel exploring the remains of the Titanic, and imagine a villainous aquatic monster that has interfered with the control cable by mixing up the connections to and from the underwater cameras, sonar equipment, robot arms, actuators, and sensors. What appears on the many screens, lights, and dials, no longer makes any sense, and the actuators no longer have their usual functions. What can the engineers do to save the situation? By observing the structure of the changes on the control panel that occur when they press various buttons and levers, the engineers should be able to deduce which buttons control which kind of motion of the vehicle, and which lights correspond to information deriving from the sensors mounted outside the vessel, which indicators correspond to sensors on the vessel’s tentacles, and so on.”

It is meant here that all knowledge must come from the sensors and the effect of actions on them, because there is just no other source of knowledge. This point of view changes the computational problem of perception from inferring objective things about the physical world from the senses to finding relations between actions and sensor data.

This remark is not specific to the brain. It would apply whether the perceptual system is made of neurons or not – for example it could be an engineered piece of software for a robot. So what in fact is specific about the brain? The question is perhaps too broad, but I can at least name one specificity. The brain is made of neurons, and each neuron is a separate entity (with a membrane) that interacts with other neurons, which are relatively elementary (compared to the entire organism) and essentially identical (in the great lines). Each entity has sensors (dendrites) and can act by sending spikes through their axons (and also in other ways, but on a slower timescale). So in fact we could think of the villainous monster concept at different levels. The higher level is the organism, with sensors (photoreceptors) and actuators (muscle contraction). At a lower level, we could consider a brain structure, for example the hippocampus, and see it as a system with sensors (spiking inputs to the hippocampus) and actuators (spiking outputs). What can be said about the relationship between actions and sensor inputs? In fact, we could arbitrarily define a system by doing at graph cut in the connectivity graph of the brain. At the final level of analysis, we might analyze the neuron as a perceptual system, with a set of sensors (dendrites) and one possible action (to produce a spike). At this level, it may also be possible to define the same neuron as a different perceptual system by redefining the set of sensors and actions. For example, sensors could be a number of state variables, such as membrane potential at different points along the dendritic tree, calcium concentration, etc; actions could be changes in channel densities, in synaptic weights, etc. This is not completely crazy because in a way, these sensed properties and the effect of cellular actions are all that the cell can ever know about the “outside world”.

One might call this conceptual framework the “villainous monster recursion”. I am not sure where it could lead, but it seems intriguing enough to think about it!

On the notion of information in neuroscience

In a previous post, I criticized the notion of “neural code”. One of my main points was that information can only make sense in conjunction with a particular observer. I am certainly not the first one to make this remark: for example, it is presented in a highly cited review by deCharms and Zador (2000). More recently Buzsaki defended this point of view in a review (Neuron 2010), and from the notes in the supplemental material, it appears that he is clearly more philosophically lucid than the average neuroscientist on these issues (check the first note). I want to come back on this issue in more detail.

When one speaks of information or code in neuroscience, it is generally meant in the sense of Shannon. This is a very specific notion of information coming from communication theory. There is an emitter who wants to transmit some message to a receiver. The message is transmitted in an altered form called “code”, for example Morse code, which contains “information” insofar as it can be “decoded” by the observer into the original message. The metaphor is generally carried to neuroscience in the following form: there are things in the external world that are described in some way by the experimenter, for example bars with a variable orientation, and the activity of the nervous system is seen as a “code” for this description. It may carry “information” about the orientation of the bar insofar one can reconstruct the orientation from the neural activity.

It is important to realize how limited this metaphor is, and indeed that it is a metaphor. In a communication channel, the two ends agree upon a code, for example on the correspondence between letters and Morse code. For the receiving end, the fact that the message is information in the common sense of the word relies on two things: 1) that the correspondence is known, 2) that the initial message itself makes sense for the receiver. For example, imagine a few centuries ago, someone is given a papyrus with ancient Egyptian hieroglyphs. Probably it will represent very little information for that person because she has no way to make sense of it. The papyrus becomes informative with the Rosetta stone, where the same text is written in ancient Egyptian and in ancient Greek, so that the papyrus can be translated to ancient Greek. But of course this becomes information only if ancient Greek makes sense for the person that reads it!

So the metaphor of a “neural code”, understood in Shannon’s sense, is problematic in two ways: 1) the experimenter and the nervous system obviously do not agree upon a code, and 2) how the original “message” makes sense for the nervous system is left entirely unspecified. I will give another example to make it clearer. Imagine you have a vintage thermometer (non-digital), but that thermometer does not have any graduation. You could replace the thermometer by the activity of a temperature-sensitive neuron. From the point of view of information theory, there is just as much information about temperature in the liquid level than if temperature were given as a number of Celsius degrees. But clearly for an observer, there is very little information because one does not know the relationship between the level of the liquid and the physical temperature, so it is essentially useless. Perhaps one could say that the level says something relative about temperature, that is, whether a temperature is hotter than another one. But even this is not true, because it relies on the prior knowledge that the level of the liquid increases when the temperature increases, a physical law that is not obvious at all. So to make sense of the liquid level, one would actually rely on association with other sources of information that are not given by the thermometer, e.g. that for some level one feels cold and that for another level one feels hot. But now this means that the information in the liquid level is actually limited (and in fact defined) not by the “communication channel” (how accurate the thermometer is) but by the external source of knowledge that provides meaning to the liquid level. This limitation comes from the fact that at no moment in time is the true temperature in Kelvin given as an objective truth to the observer. The only way it gets information is through its own sensors. This is why Shannon’s information is highly misleading as a metaphor for information in biological systems: there can be no agreed code between the environment and the organism. The organism has to learn ancient Egyptian just with hieroglyphs.

To finish with this example, imagine now that the thermometer is graduated, so you can read the temperature. Wouldn’t this provide the objective information that was previously missing? As a matter of fact, not really. For example, as a European, if I am given the temperature in Fahrenheit degrees, I have no idea whether it is hot or cold. So the situation is not different for me than previously. Of course if I am also given the correspondence between Fahrenheit and Celsius, then it will start making sense for me. But how can Celsius degrees make sense for me in the first place? Again these are just numbers with arbitrary units. Celsius degrees make sense because they can be related to physical processes linked with temperature: water freezes at 0° and boils at 100°. Presumably, the same thing applies to our perception of temperature: the body senses a change in firing rate of some temperature-sensitive neuron, and this becomes information about temperature because it can be associated with a number of biophysical processes linked with temperature, say sweating, and all these effects can be noticed. In fact, what this example shows is that the activity of the temperature-sensitive neuron does not provide information about physical temperature (number of Kelvin degrees), but rather about the occurrence of various other events that can be captured with other sensors. This set of relationships between events is, in a way, the definition of temperature for the organism, rather than some number in arbitrary units.

Let us summarize. In Shannon’s information theory, it is implicitly assumed that there are two ends in a communication channel, and that 1) both ends agree upon a code, i.e., a correspondence between descriptive elements of information on both ends, and that 2) the initial message on the emitter end makes sense for the observer at the other end. None of these two assumptions apply to a biological organism because there is only one end. All the information that it can ever get about the world comes from that end, and so in this context Shannon’s information only makes sense for an external observer who can see both ends. A typical error coming from the failure to realize this fact is to highly overestimate the information in neural activity about some experimental quantity. I discussed this specific point in detail in a recent paper. The overestimation comes simply from the fact that detailed knowledge about the experiment is implicitly assumed on behalf of the nervous system.

Followed to its logical conclusions, the information-processing line of reasoning leads to what Daniel Dennett called the “Cartesian theater”. If neural activity gives information about the world in Shannon’s sense, then this means that at some final point this neural activity has to be analyzed and related to the external world. Indeed if this does not happen, then we cannot be speaking about Shannon information, for there is no link with the initial message. So this means that there is some critical stage at which neural activity is interpreted in objective terms. As Dennett noted, this is conceptually not very far from the dualism of Descartes, who thought that there is a non-material mind that reads the activity of the nerves and interprets it in terms of the outside physical world. The “Cartesian theater” is the brain seen as a screen where the world is projected, that a homunculus (the mind) watches.

Most neuroscientists reject dualism, but if one is to reject dualism, then there must be no final stage at which the observer end of the communication channel (the senses) is put in relationship with the emitter end (the world). All information about the world must come from the senses, and the senses alone. Therefore, this “information” cannot be meant in Shannon’s sense.

This, I believe, is essentially what James Gibson meant when he criticized the information-processing view of cognition. It is also related to Hubert Dreyfus’s criticism of artificial intelligence. More recently, Kevin O’Regan made similar criticisms. In his most cited paper with Noë (O’Regan and Noë, BBS 2001), there is an illuminating analogy, the “villainous monster”. Imagine you are exploring the sea with an underwater vessel. But a villainous monster mixes all the cables and so all the sensors and actuators are now related to the external world in a new way. How can you know anything about the world? The only way is to analyze the structure of sensor data and their relationships with actions that you can perform. So if one rejects dualism, then this is the kind of information that is available to the nervous system. A salient feature of this notion of information is that, contrary to Shannon’s information, it is defined not as numbers but as relations or statements: if I do action A, then sensory property B happens; if sensory property A happens, then another property B will happen next; if I do action A in sensory context B, then C happens.


Philosophy of knowledge

We have concluded that, if dualism is to be rejected, then the right notion of information for a biological organism is in terms of statements. This makes the problem of perception quite similar to that of science. Science is made of universal statements, such as the law of gravitation. But not all statements are scientific, for example “there is a God”. In philosophy of knowledge, Karl Popper proposed that a scientific statement is one that can potentially be falsified by an observation, whereas a metaphysical statement is a statement that cannot be falsified. For example, the statement “all penguins are black” is scientific, because I could imagine that one day I see a white penguin. On the other hand, the statement “there is a God” is metaphysical, because there is no way I can check. Closer to the matter of this text, the statement “the world is actually five-dimensional but we live in a three-dimensional subspace” is also metaphysical because independently of whether it is true or not, we have no way to confirm it or falsify it.

To come back to the matter of this text, I propose to qualify as metaphysical for an organism all knowledge that cannot be falsified, given the senses and possibilities for action. For example, in an experiment, one could relate the firing rate of a neuron with the orientation of a bar presented in front of the eyes. There is information in Shannon’s sense about the orientation in the firing rate. This means that we can “decode” the firing rate into the parameter “orientation”. However this decoding requires metaphysical knowledge because “orientation” is defined externally by the experimenter, it does not come out from the neuron’s activity itself. From the neuron’s point of view, there is no way to falsify the statement “10 Hz means horizontal bar”, because the notion of horizontal (or bar) is either defined in relation to something external to the neuron, or by its activity itself (horizontal is when the activity is 10 Hz) and in this latter case the statement is a tautology.

Therefore it appears that there can be very little information without metaphysical knowledge in the response of a single neuron, or in its input. Note that it is not completely empty, for there could be information about the future state of the neuron in the present state.


The structure of information and “neural assemblies”

When information is understood as statements rather than numbers to be decoded, it appears that information to be represented by the brain is much richer than implied by the usual notion inspired by Shannon’s communication theory. In particular, the problem of perception is not just to relate a vector of numbers (e.g. firing rates) to a particular set of parameters representing an object in the world. What is to be perceived is much richer than that. For example, in a visual scene, there could be Paul, a person I know, wearing a new sweater, sitting in a car. What is important here is that a scene is not just a “bag of objects”: objects have relationships with each other, and there are many possible different relationships. For example there is a car and there is Paul, and Paul is in a specific relationship with the car, that of “sitting in it”.

Unfortunately this does not fit well with the concept of “neural assemblies”, which is the mainstream assumption about how things we perceive are represented in the brain. If it is true that any given object is represented by the firing of a given assembly of neurons, then several objects should be represented by the firing of a bigger assembly of neurons, the union of all assemblies, one for each object. Several authors have noted that this may lead to the “superposition catastrophe”, i.e., there may be different sets of objects whose representations are fused into the same big assembly. But let us assume that this problem has somehow been solved and that there is no possible confusion. Still, the representation of a scene can be nothing else than an unstructured “bag of objects”, there are no relationships between objects in the assembly representation. One way to save the assembly concept is to consider that there are combination assemblies, which code for specific combinations of things, perhaps in a particular relationship. But this cannot work if it is the first time I see Paul in that sweater. There is a fundamental problem with the concept of neural assembly, which is that there is representation of relations, only of things to be related. In analogy with language, there is no syntax in the concept of neural assemblies. This is actually the analogy chosen by Buzsaki in his recent Neuron review (2010).

This remark, mostly made in the context of the binding problem, has led authors such as von der Malsburg to postulate that synchrony is used to bind the features of an object, as represented by neural firing. This avoids the superposition catastrophe because at a given time, only one object is represented by neural firing. It also addresses the problem of composition: by defining different timescales for synchrony, one may build representations for objects composed of parts, possibly in a recursive manner. However, the analogy of language shows that this is not going to be enough, because only one type of relation can be represented in this way. But the same analogy also shows that it is conceptually possible to represent structures as complex as linguistic structure by using time, in analogy with the flow of a sentence. Just for the sake of argument, and I do not mean that this is a plausible proposition (although it could be), you could imagine that assemblies can code either things (Paul, a car, a jumper) or relations between things (sitting, wearing), that only one assembly would be active at a time, and that the order of activation indicate which things a relation applies to. Here not only synchrony is important, but also the order of spikes. This idea is quite similar to Buszaki’s “neural syntax” (based on oscillations), but I would like to emphasize a point that I believe has not been noticed: that assemblies must stand not only for things but also for relations between things (note that “things” can also be thought of relations, and in this case we are speaking of relations of different orders).

All this discussion, of course, is only meant to save the concept of neural assembly and perhaps one might simply consider that a completely different concept should be looked for. I do not discard this more radical possibility. However, I note that if it is admitted that neurons interact mostly with spikes, then somehow the spatio-temporal pattern of spikes is the only way that information can be carried. Unless, perhaps, we are completely misled by the notion of “information”.