What is computational neuroscience? (XXVII) The paradox of the efficient code and the neural Tower of Babel

A pervasive metaphor in neuroscience is the idea that neurons “encode” stuff: some neurons encode pain; others encode the location of a sound; maybe a population of neurons encode some other property of objects. What does this mean? In essence, that there is a correspondence between some objective property and neural activity: when I feel pain, this neuron spikes; or, the image I see is “represented” in the firing of visual cortical neurons. The mapping between the objective properties and neural activity is the “code”. How insightful is this metaphor?

An encoded message is understandable to the extent that the reader knows the code. But the problem with applying this metaphor to the brain is only the encoded message is communicated, not the code, and not the original message. Mathematically, original message = encoded message + code, but only one term is communicated. This could still work if there were a universal code that we could assume all neurons can read, the “language of neurons”, or if somehow some information about the code could be gathered from the encoded messages themselves. Unfortunately, this is in contradiction with the main paradigm in neural coding theory, “efficient coding”.

The efficient coding hypothesis stipulates that neurons encode signals into spike trains in an efficient way, that is, it uses a code such that all redundancy is removed from the original message while preserving information, in the sense that the encoded message can be mapped back to the original message (Barlow, 1961; Simoncelli, 2003). This implies that with a perfectly efficient code, encoded messages are undistinguishable from random. Since the code is determined on the statistics of the inputs and only the encoded messages are communicated, a code is efficient to the extent that it is not understandable by the receiver. This is the paradox of the efficient code.

In the neural coding metaphor, the code is private and specific to each neuron. If we follow this metaphor, this means that all neurons speak a different language, a language that allows expressing concepts very concisely but that no one else can understand. Thus, according to the coding metaphor, the brain is a Tower of Babel.

Can this work?

What is computational neuroscience? (XXVI) Is optimization a good metaphor of evolution?

Is the brain the result of optimization, and if so, what is the optimization criterion? The popular argument in favor of the optimization view goes as follows. The brain is the result of Darwinian evolution, and therefore is optimally adapted to its environment, ensuring maximum survival and reproduction rates. In this view, to understand the brain is primarily to understand what “adapted” means for a brain, that is, what is the criterion to be optimized.

Previously, I have pointed out a few difficulties in optimality arguments used in neuroscience, in particular the problem of specification (what is being optimized) and the fact that evolution is a history-dependent process, unlike a global optimization procedure. An example of this history dependence is the fascinating case of mitochondria. Mitochondria are organelles in all eukaryotes cells that produce most of the cellular energy in the form of ATP. At this date, the main view is that these organelles are a case of symbiosis: mitochondria were once prokaryote cells that have been captured and farmed. This symbiosis has been selected and conserved through evolution, but optimization does not seem to be the most appropriate metaphor in this case.

Nonetheless, the optimization metaphor can be useful when applied to circumscribed problems that a biological organism might face, for example the energy consumption of action potential propagation. We can claim for example that, everything else being equal, an efficient axon is better than an inefficient one (with the caveat that in practice, not everything else can be made equal). But when applied at the scale of an entire organism, the optimization metaphor starts facing more serious difficulties, which I will discuss now.

When considering an entire organism, or perhaps an organ like the brain, then what criterion can we possibly choose? Recently, I started reading “Guitar Zero” by Gary Marcus. The author points out that learning music is difficult, and argues that the brain has evolved for language, not music. This statement is deeply problematic. What does it mean that the brain has evolved for language? Language does not preexist to speakers, so it cannot be that language was an evolutionary (“optimization”) criterion for the brain, unless we have a more religious view of evolution. Rather, evolutionary change can create opportunities, which might be beneficial for the survival of the species, but there is no predetermined optimization criterion.

Another example is the color visual system of bees (see for example Ways of coloring by Thompson et al.). A case can be made that the visual system of bees is adapted to the color of flowers they are interested in. But conversely, the color of flowers is adapted to the visual system of bees. This is a case of co-evolution, where the “optimization criterion” changes during the evolutionary process.

Thus, the optimization criterion does not preexist to the optimization process, and this makes the optimization metaphor weak.

A possible objection is that there is a preexisting optimization criterion, which is survival or reproduction rate. While this might be correct, it makes the optimization metaphor not very useful. In particular, it applies equally to all living species. The point is, there are species and they are different even though the optimization criterion is the same. Not all have a brain. Thus, optimization does not explain why we have a brain. Species that have a brain have different brains. The nervous system of a nematode is not the same as that of a human, even though they are all equally well adapted, and have evolved for exactly the same amount of time. Therefore, the optimization view does not explain why we speak and nematodes don’t, for example.

The problem is that “fitness” is a completely contextual notion, which depends both on the environment and on the species itself. In a previous post where I discussed an “existentialist” view of evolution, I proposed the following thought experiment. Imagine a very ancient Earth with a bunch of living organisms that do not reproduce but can survive for an indefinite amount of time. By definition, they are adapted since they exist. Then at some point, an accident occurs such that one organism starts multiplying. It multiplies until it occupies the entire Earth and resources become scarce. At this point of saturation, organisms start dying. The probability of dying being the same for both non-reproducing organisms and reproducing ones, at some point there will be only reproducing organisms. Thus in this new environment, reproducing organisms are adapted, whereas non-reproducing ones are not. If we look at the history of evolution, we note that the world of species constantly changes. Species do not appear to converge to some optimal state, because as they evolve, the environment changes and so does the notion of fitness.

In summary, the optimization criterion does not preexist to the optimization process, unless we consider a broad existentialist criterion such as survival, but then the optimization metaphor loses its usefulness.

What is computational neuroscience (XXV) - Are there biological models in computational neuroscience?

Computational neuroscience is the science of how the brain “computes”, that is, how the brain performs cognitive functions such as recognizing a face or walking. Here I will argue that most models of cognition developed in the field, especially as regards sensory systems, are actually not biological models but hybrid models consisting of a neural model together with an abstract model.

First of all, many neural models are not meant to be models of cognition. For example, there are models that are developed to explain the irregular spiking of cortical neurons, or oscillations. I will not consider them. According to the definition above, I categorize them in theoretical neuroscience rather than computational neuroscience. Here I consider for example models of perception, memory, motor control.

An example that I know well is the problem of localizing a sound source from timing cues. There are a number of models, including a spiking neuron model that we have developed (Goodman and Brette, 2010). This model takes as input two sound waves, corresponding to the two monaural sounds produced by the sound source, and outputs the estimated direction of the source. But the neural model, of course, does not output a direction. Rather, the output of the neural model is the activity of a layer of neurons. In the model, we consider that direction is encoded by the identity of the maximally active neuron. In another popular model in the field, direction is encoded by the relative total activity of two groups of neurons (see our comparison of models in Goodman et al. 2013). In all models, there is a final step which maps the activity of neurons to estimated sound location, and this step is not a neural model but an abstract model. This causes big epistemological problems when it comes to assessing and comparing the empirical value of models because a crucial part of the models is not physiological. Some argue that neurons are tuned to sound location; others that population activity varies systematically with sound location. Both are right, and thus none of these observations is a decisive argument to discriminate between the models.

The same is seen in other sensory modalities. The output is the identity of a face; or of an odor; etc. The symmetrical situation occurs in motor control models: this time the abstract model is on the side of the input (mapping from spatial position to neural activity or neural input). Memory models face this situation twice, with abstract models both on the input (the thing to be memorized) and the output (the recall).

Fundamentally, this situation has to do with the fact that most models in computational neuroscience take a representational approach: they describe how neural networks represent in their firing some aspect of the external world. The representational approach requires defining a mapping (called the “decoder”) from neural activity to objective properties of objects, and this mapping cannot be part of the neural model. Indeed, sound location is a property of objects and thus does not belong to the domain of neural activity. So no sound localization model can ever be purely neuronal.

Thus to develop biological models, it is necessary to discard the representational approach. Instead of “encoding” things, neurons control the body; neurons are agents (rather than painters in the representational approach). For example, a model of sound localization should be a model of an orientational response, including the motor command. The model explains not how space is “represented”, but how an animal orients its head (for example) to a sound source. When we try to model an actual behavior, we find that the nature of the problem changes quite significantly. For example, because a particular behavior is an event, neural firing must also be seen as events. In this context, counting spikes and looking at the mutual information between the count and some stimulus property is not very meaningful. What matters is the events that the spikes trigger in the targets (muscles or other neurons). The goal is not to represent the sensory signals but to produce an appropriate behavior. One also realizes that the relation between sensory signals and actions is circular, and therefore cannot be adequately described as “processing”: sensory signals make you turn the head, but if you turn the head, the sensory signals change.

Currently, most models of cognition in computational neuroscience are not biological models. They include neuron models together with abstract models, a necessity stemming from the representational approach. To a make biological model requires including a model of the sensorimotor loop. I believe this is the path that the community should take.

What is computational neuroscience? (XXIV) - The magic of Darwin

Darwin’s theory of evolution is possibly the most important and influential theory in biology. I am not going to argue against that claim, as I do believe that it is a fine piece of theoretical work, and a great conceptual advance in biology. However, I also find that the explanatory power of Darwin’s theory is often overstated. I recently visited a public exhibition in a museum about Darwin. Nice exhibition overall, but I was a bit bothered by the claim that Darwin’s theory explains the origin, diversity and adaptedness of species, case solved. I have the same feeling when I read in many articles or when I hear in conversations with many scientists that such and such observed feature of living organisms is “explained by evolution”. The reasoning generally goes like this: such biological structure is apparently beneficial to the organism, and therefore the existence of that structure is explained by evolution. As if the emergence of that structure directly followed from Darwin’s account of evolution.

To me, the Darwinian argument is often used as magic, and is mostly void of any content. Replace “evolution” by “God” and you will notice no difference in the logical structure or arguments. Indeed, what the argument actually contains is 1) the empirical observation that the biological organism is apparently adapted to its environment, thanks to the biological feature under scrutiny; 2) the theoretical claim that organisms are adapted to their environment. Note that there is nothing in the argument that actually involves evolution, i.e., the change of biological organisms through some particular process. Darwin is only invoked to back up the theoretical claim that organisms are adapted, but there is nothing specifically about Darwinian evolution that is involved in the argument. It could well be replaced by God, Lamarck or aliens.

What makes me uneasy is that many people seem to think that Darwin’s theory fully explains how biological organisms get to be adapted to their environment. But even in its modern DNA form, it doesn’t. It describes some of the important mechanisms of adaptation, but there is an obvious gap. I am not saying that Darwin’s theory is wrong, but simply that it only addresses part of the problem.

What is Darwin’s theory of evolution? It is based on three simple steps: variation, heredity and selection. 1) Individuals of a given species vary in different respects. 2) Those differences are inherited. In the modern version, new variations occur randomly at this step, and so variations are introduced gradually over generations. 3) Individuals with adapted features survive and reproduce more than others (by definition of “adapted feature”), and therefore spread those features in the population. There is ample empirical evidence for these three claims, and that was the great achievement of Darwin.

The gap in the theory is the nature and distribution of variations. In the space of all possible small variations in structure that one might imagine, do we actually see them in a biological population? Well for one, there are a substantial number of individuals that actually survive for a certain time, so a large number of those variations are not destructive. Since the metaphor of the day is to see the genome as a code for a program, let us consider computer programs. Take a functional program and randomly change 1% of all the bits. What is the probability that 1) the program doesn’t crash, 2) it produces something remotely useful? I would guess that the probability is vanishingly small. You will note that this is not a very popular technique in software engineering. Another way to put it: consider the species of programs that calculate combinatorial functions (say, factorials, binomial coefficients and the like). Surely one might argue that individuals vary by small changes, but conversely, would a small random change in the code typically produce a new combinatorial function?

So it doesn’t follow logically from the three steps of Darwin’s theory that biological organisms should be adapted and survive to changing environments. There is a critical ingredient that is missing: to explain how, in sharp contrast with programs, a substantial fraction of new variations are constructive rather than destructive. In more modern terms, how is it that completely random genetic mutations result in variations in phenotypes that are not arbitrary?

Again I am not saying that Darwin is wrong, but simply that his theory only addresses part of the problem, and that it is not correct to claim that Darwin’s theory fully explains how biological organisms are adapted to their environment (ie, perpetuate themselves). A key point, and a very important research question, is to understand how new variations can be constructive. This can be addressed within the Darwinian framework, as I outlined in a previous post. It leads to a view that departs quite substantially from the program metaphor. A simple remark: the physical elements that are subject to random variation cannot be mapped to the physical elements of structure (e.g. molecules) that define the phenotype, for otherwise those random variations would lead to random (ie mostly destructive) phenotypes. Rather, the structure of the organism must be the result of a self-regulatory process that can be steered by the elements subject to random variation. This is consistent with the modern view of the genome as a self-regulated network of genes, and with Darwin’s theory. But it departs quite substantially from the magic view of evolution theory that is widespread in the biological literature (at least in neuroscience), and instead points to self-regulation and optimization processes operating at the scale of the individual (not of generations).

What is computational neuroscience? (XXIII) On optimality principles in neuroscience

The notion of optimality of biological structures is both quite popular as a type of explanation and highly criticized by many scientists. It is worth understanding why exactly.

In a previous post, I observed that there are different types of explanations, one of which is final cause. Final cause would be for example the minimization of energy in physics or survival and reproduction in biology. Evolutionary theory makes final causes very important in biology. However, I find that such explanations are often rather weak. Such explanations generally take the form: we observe such biological structure because it is optimal in some sense. What exactly is explained or meant here is not always so clear. That a biological structure is optimal means that we consider a set of possible structures, and among that set the observed structure maximizes some criterion. But what is this set of possible structures? Surely not all possible molecular structures. Indeed evolutionary theory does not say that biological organisms are optimal. It says that changes in structure that occur from one generation to the next tend to increase “adaptability” of the organism (there are variations around this theme, such as gene-centered theories). Evolution is a process and biological organisms result from an evolutionary history: they are not absolute optima among all possible molecular structures (otherwise there would not be many species).

To see this, consider the following analogy. From the postulate that people tend to maximize their own interest, I propose the following explanation of social structure: rich people are rich because they want to be rich. Why is this explanation not satisfying? Because both poor and rich people tend to maximize their own interest (by assumption), and yet only the latter are rich. The problem is that we have specified a process that has a particular trend (increasing self interest), but there is no necessity that this process reaches a general optimal of some sort. It is only optimal within a particular individual history. Maybe the poor people have always acted in their own interest, and maybe they are richer than they would be otherwise, but that doesn’t mean they end up rich. In the same way, evolution is a process and it only predicts an optimum within a particular evolutionary history.

Thus, the first remark is that optimality must be understood in the context of a process, both phylogenetic (species history) and ontogenetic (development), not as a global property. Optimality can only be local with respect to that process – after all, there are many species, not a single “optimal” one. That is to say, the fitness criterion (which has to defined more precisely, see below) tends to increase along the process, so that, at equilibrium (assuming there is such a thing – see the theory of punctuated equilibria), the criterion is locally maximized with respect to that process (i.e., cannot increase by the process).

This is the first qualification. There are at least two other types of criticisms that have been raised, which I want to address now, one empirical and one theoretical. The empirical criticism is that biological organisms are not always optimal. The theoretical criticism is that biological organisms do not need to be optimal but only “good enough”, and there might be no evolutionary pressure when organisms are good enough.

I will first address the empirical criticism: biological organisms are not always optimal. First, they are not expected to be, because of the above qualification. But this is not the deep point. This criticism raises the problem of specification: optimal with respect to what? The Darwinian argument only specifies (local) optimality with respect to survival and reproduction. But optimality is generally discussed with respect to some particular aspect of structure or behavior. The problem is that it is generally not obvious at all how the evolutionary fitness criterion should translate to in terms of structure. This is the problem of specification.

For example, I have heard the argument that “people are not optimal”. I take it that it is meant that people are not rational. This is indeed a very well established fact of human psychology. If you haven’t read it yet, I invite you to read “Thinking, fast and slow” by Daniel Kahneman. There are all sorts of cognitive biases that make us humans not very rational in general. To give you a random example, take the “planning fallacy”: if you try to plan the duration of a substantial project (say, building a house or writing a book), then you will almost always underestimate it by an order of magnitude. The reason is that when planning, you imagine a series of steps that are necessary to achieve the project but you don’t imagine all the possible accidents that might happen (say the contractor dies). Any specific accident is very unlikely so you don’t or can’t think about it, but it is very likely that one accident of this type happens, and so you seriously underestimate the completion time. Annoyingly, you still do if you know about the fallacy (at least I still do). This is the problem of epistemic uncertainty (events that are not part of your probabilistic model, as opposed to probabilistic uncertainty, as in rolling a die – see e.g. the Black Swan by Taleb). So humans are not optimal with respect to the rationality criterion. Why is that? Perhaps rationality does not give you an evolutionary advantage. Or perhaps it would by itself, but it would also come with a very large cost in terms of maintaining the adequate structure. Or perhaps it would require such a different brain structure from what humans currently have that no evolutionary step could possibly take us there. Or perhaps it is just impossible to be rational, because the problem of epistemic uncertainty is so fundamental. I am not trying to give an answer, but simply pointing out that the evolutionary argument does not imply that structure and behavior should be optimal with respect to all criteria that seem desirable. Evolutionary “fitness” is a complex notion that encompasses a set of contradicting subcriteria and history effects.

With this important qualification in mind, it should be noted that there are many aspects of biological structure and behavior that have been shown quite convincingly to be optimal or near-optimal with respect to appropriately chosen criteria. It would be sad to discard them, because those explanations give parsimonious accounts of large sets of empirical data. For example, while people are generally not rational or consistent in their reasoning and decisions, when it comes to perceptual or motor tasks it is well documented that humans tend to be near optimal, as accounted by the Bayesian framework. There are of course important qualifications, but it is the case that many aspects of perception are well predicted by the Bayesian framework, at a quantitative (not just qualitative) level (note that I don’t mean perception in the phenomenological sense, but simply in the sense of sensory-related tasks). One big difference with the preceding example is that there is no epistemic uncertainty in these tasks; that is, when perceptual systems have a good statistical model of reality, then they seem to use it in a Bayesian-optimal way.

There are also convincing cases of optimality for biological structures. Robert Rosen discusses some of them in his book “Optimality principles in biology”, in particular the structure of the vascular system (which also seems to apply to lungs). Many geometrical aspects of the vascular system, such as angle and diameter at branching points and even the number of blood vessels can be accounted for by optimality principles with respect to appropriately chosen (but importantly, simple) criteria. This latter point is critical, as is pointed out in that book. Two criteria are simultaneously considered in this case: maximizing the surface of contact and minimizing the resistance to flow (and thus the required energy).

Another well documented case, this time in neurophysiology, is in the geometrical and molecular properties of axons. There is a short paper by Hodgkin that in my opinion shows pretty good optimality reasoning, including some of the qualifications I have mentioned: “The Optimum Density of Sodium Channels in an Unmyelinated Nerve” (1975). He starts by noting that the giant squid axon mediates the escape reflex, and it is critical for survival that this reflex is fast. Therefore speed of conduction along the axon is a good candidate for an optimality analysis: it makes sense, from an evolutionary viewpoint, that the structure of the axon is optimized for speed. Then he tries to predict the density of sodium channels that would maximize speed. As it turns out, this simple question is itself quite complex. He argues that each channel also increases the membrane capacitance, in a way that depends on voltage because the geometrical conformation of the channels is voltage-dependent. Nonetheless he manages to estimate that effect and derives an optimal channel density, which turns out to be of the right order of magnitude (compared with measurements). He also notes that the relation between channel density and velocity has a “flat maximum”, so the exact value might also depend on other aspects than conduction speed.

He then discusses those other aspects at the end of the text. He notes that in other cases (different axons and species), the prediction based on speed does not work so well. His argument then is that speed may simply not be the main relevant criterion in those other cases. It was in the case of the squid axon because it mediates a time-critical escape reflex, but in other cases speed may not be so important and instead energy consumption might be more relevant. Because the squid axon mediates an escape reflex, it very rarely spikes and so energy consumption is presumably not a big issue – compared to being eaten alive because you are too slow. But energy consumption might be a more important criterion for axons that fire more often (say, cortical neurons in mammals). There is indeed a large body of evidence that tends to show that many properties of spike initiation and propagation are adapted for energy efficiency (again, with some qualifications, e.g. fast spiking cells are thought to be less efficient because it seems necessary to fire at high firing rates). There are other structures where axon properties seem to be tuned for isochrony, yet another type of criterion. Isochrony means that spikes produced by different neurons arrive at the same time at a common projection. This seems to be the case in the optic tract (Stanford 1987, “Conduction Velocity Variations Minimizes Conduction Time Differences Among Retinal Ganglion Cell Axons”) and many other structures, for example the binaural system of birds. Thus many aspects of axon structure seem to show a large degree of adaptation, but to a diversity of functional criteria, and it often involves trade-offs. This concludes this discussion of the problem of specification.

The second criticism is not empirical but theoretical: biological organisms do not need to be optimal but only “good enough”, and there might be no evolutionary pressure when organisms are good enough. There is an important sense in which this is true. This is highlighted by Hodgkin in the paper I mentioned: there is a broad range of values for channel density that leads to near-optimal (“good enough”) velocities, and so the exact value might depend on other, less important, criteria, such as energy consumption. But note that this reasoning is still about optimality; simply, it is acknowledged that organisms are not expected to be optimal with respect to any single criterion, since survival depends on many aspects. A related point is that of robustness and redundancy. It appears that there is a lot of redundancy in biological systems. You could lose a kidney and still be fine, and this is also true at cellular level. This again can be thought of in terms of epistemic uncertainty: you could build something that is optimal with respect to a particular model of the world, but it might make that something very fragile to unexpected perturbations, events that are not predicted by the model. Thus redundancy or more generally robustness is desirable, even though it makes organisms suboptimal with respect to any specific model.

But note that we have not left the framework of evolutionary fitness, since we have described redundancy as a desirable feature (as opposed to a random feature). We have simply refined the concept of optimality, which it should be clear now is quite complex, as it must be understood with respect to a constellation of possibly contradictory subcriteria as well as with respect to epistemic uncertainty. But we are still making all those qualifications within the framework of adaptive fitness of organisms. This does not mean that biological organisms can be suboptimal because of lack of strong evolutionary pressure. More precisely, it means that they can be suboptimal with respect to a particular criterion for which there is a lack of strong evolutionary pressure, if the same structure is also subjected to evolutionary pressure on another criterion. These two criteria could be for example conduction speed and energy consumption.

Yet it could be (and has been) argued that even if a structure were subjected to a single criterion, it might still not be optimal with respect to that criterion if evolutionary pressure is weak. For example, it is often stated that spikes of the squid giant axon are not efficient, as in the Hodgkin-Huxley model they are about 3-4 times more energetically expensive than strictly necessary. Because those axons fire very rarely, it makes little difference whether spikes are efficient or not. Considering this fact, spike efficiency is “good enough”.

I find this theoretical argument quite weak. First, let me note that the 3-4 factor applies to the Hodgkin-Huxley model, which was calibrated mainly for action potential shape, but since then it has been refined and the factor is actually smaller (see e.g. work by Benzanilla). But it’s not the important point. From a theoretical viewpoint, even if evolutionary pressure is weak, it is not inexistent. By a simple argument I exposed before, biological organisms must live in environments where resources are scarce, and so there is strong pressure for efficient use of energy and resources in general. Thus even if the giant axon’s spike is a small proportion of that use, there is still some pressure on its efficiency. Squids are a very old species and there seems to be no reason why that pressure might not have applied at some point. But again this is not the most important point. In my view, the most important point is that evolutionary pressure does not apply at the level of individual elements of structure (e.g., on each axon). It applies at the level of genes, which have impact on the entire organism, or at least a large part of it. So the question is not whether the giant squid axon is energy efficient, but rather whether spike conduction along axons is efficient. It is also quite possible that mechanisms related to metabolism are more generic. Thus, while there might be little evolutionary pressure on that particular axon, there certainly is on the set of all axons.

Why then is the squid giant axon inefficient? (I’m assuming it actually is, although to a lesser degree than usually stated) Here is a possible explanation. Efficiency of spike propagation and initiation depends on the properties of ionic channels. In particular, to have spikes that are both fast and efficient you need sodium channels to inactivate very fast. There is likely a limit to how fast it can be, since proteins are discrete structures (which might explain that fast spiking cortical neurons are relatively inefficient). In mammals, fast inactivation is conveyed not by the main protein of the sodium channel (alpha subunit) but by so called beta subunits, which are distinct proteins that modulate channel properties. This comes with a cost, since all those proteins need to be actively maintained (the resting cost). If the neuron spikes often, most of the energetic cost is incurred by spikes. If it spikes very rarely, most of the energetic cost is the resting cost. When that resting cost is taken into account, it might well be that spikes in the squid giant axon is actually quite efficient. The same line of reasoning might explain why such a big axon is not myelinated (or doesn’t show a similar kind of adaptation): myelination decreases the energetic cost of spike propagation for large diameter axons, but it increases the resting cost (you need glial cells producing myelin).

To conclude: optimality principles are important in biology because these are principles that are specific of living organisms (i.e., they are somehow adapted) and they do explain a large body of empirical data. However, these must be applied with care, keeping in mind the problem of specification (optimality with respect to what; i.e. what is actually important for survival), the problem of history effects (optimality is local relative to phylogenetic and ontogenetic processes) and the problem of epistemic uncertainty (leading to robustness principles).

 

Update: I noticed that in Hodgkin’s “Sherrington Lectures” (1964), the author estimates that mean firing frequency of the giant axon in the life of a squid does not exceed a few impulses per minute, which should produce an amount of sodium intake due to spikes of about 1/300 the amount without spike (leak). Thus the cost of spikes is indeed a negligible proportion of the total cost of axon maintenance.

What is computational neuroscience? (XXII) The whole is greater than the sum of its parts

In this post, I want to come back on methodological reductionism, the idea that the right way, or the only way, to understand the whole is to understand the elements that compose it. A classical rebuttal of methodological reductionism is that the “whole is greater than the sum of its parts” (Aristotle). I feel that this argument is often misunderstood, so I have thought of a simple example from biology.

Cells are enclosed by membranes, which are made of lipids. A membrane is a closed surface that defines an interior and an exterior. No part of a membrane is a membrane, because it is not a closed surface. You could study every single lipid molecule that forms a membrane in detail, and you would still have no understanding of what a membrane is, despite the fact that these molecules are all there is in the membrane (ontological reductionism), and that you have a deep understanding of every single one of them. This is because a membrane is defined as a particular relationship between the molecules, and therefore is not contained in or explained by any of them individually.

There is another important epistemological point in this example. You might want to take a “bottom-up” approach to understanding what a membrane is. You would start by looking at a single lipid molecule. Then you could take a larger patch of membrane and study it, building on the knowledge you have learned from the single molecule. Then you could look at larger patches of membrane to understand how they differ from smaller patches; and so on. However, at no stage in this incremental process do you approach a better understanding of what a membrane is, because the membrane only exists in the whole, not in a part of it, even a big part. “Almost a membrane” is not a membrane. In terms of models, a simple model of a cell membrane consisting of only a small number of lipid molecules arranged as a closed surface captures what a membrane is much better than a large-scale model consisting of almost all molecules of the original cell membrane.

This criticism applies in particular to purely data-driven strategies to understand the brain. You could think that the best model of the brain is the one that includes as much detailed empirical information about it as possible. The fallacy here is that no part of the brain is a brain. An isolated cortex in a box, for example, does not think or behave. A slice of brain is also not a brain. Something “close to the brain” is still not a brain. A mouse is a better model of a human than half a human, which is bigger and physically more similar but dead. This is the same problem as for understanding a membrane (a much simpler system!): the methodologically reductionist strategy misses that it is not the elements themselves that make the whole, it is the relationship between the elements. So the key to understand such systems is not to increase the level of detail or similarity, but to capture relevant higher-order principles.

What is computational neuroscience? (XXI) Lewis Carroll and Norbert Wiener on detailed models

The last published novel of Lewis Carroll, Sylvie and Bruno (1893 for the second volume), contains a passage that explains that a high level of detail is not necessarily what you want from a model. I quote it in full:

“What a useful thing a pocket-map is!” I remarked.

“That’s another thing we’ve learned from your Nation,” said Mein Herr, “map-making. But we’ve carried it much further than you. What do you consider the largest map that would be really useful?”

“About six inches to the mile.”

“Only six inches!” exclaimed Mein Herr. “We very soon got to six yards to the mile. Then we tried a hundred yards to the mile. And then came the grandest idea of all! We actually made a map of the country, on the scale of a mile to the mile!”

“Have you used it much?” I enquired.

“It has never been spread out, yet,” said Mein Herr: “the farmers objected: they said it would cover the whole country, and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well.”

In other words: if the model is nearly as complex as the thing it applies to, then it is no more useful than the thing itself. This theme also appears in a 1945 essay by Arturo Rosenblueth and Norbert Wiener, “The Role of Models in Science”:

“The best material model for a cat is another, or preferably the same cat. In other words, should a material model thoroughly realize its purpose, the original situation could be grasped in its entirety and a model would be unnecessary. […] This ideal theoretical model cannot probably be achieved. Partial models, imperfect as they may be, are the only means developed by science for understanding the universe. This statement does not imply an attitude of defeatism but the recognition that the main tool of science is the human mind and that the human mind is finite.”

The last sentence is the most important: a model is not something that is meant to mimic reality; it is something that is constructed by and for the human mind to help it grasp complex aspects of reality.

What is computational neuroscience? (XX) What is a realistic model?

What is a realistic neuron model? There is a hierarchy among neuron models, which goes like this: least realistic model is the integrate-and-fire model, which is phenomenological; then the single-compartment Hodgkin-Huxley model; then multicompartmental Hodgkin-Huxley models (this hierarchy is questioned by a recently accepted paper that I wrote, but I will discuss it when the paper is out).

But what is meant exactly by “realistic”? Take two models of a plane: a toy plane made of wood, and a simple paper plane. The first model certainly looks more like a plane. It has different recognizable elements of a plane: wings, helixes, a cockpit. One might say that this model is more realistic. The second model doesn’t have a cockpit, and in fact doesn’t really look like a plane. However, unlike the first model, it flies – definitely an important characteristic of planes. So which one is more realistic?

There are generally two types of answers to justify the fact that the Hodgkin-Huxley model (HH) is more realistic than the integrate-and-fire model (IF). One is: the HH model has ionic channels, the IF model doesn’t. Another one is: the HH model has been proven right with experiments.

Let us start with the first type of answer. Strictly speaking, the HH model does not have ionic channels. Ionic channels are proteins. The HH model is a set of equations. There are parts of these equations that we identify with properties of proteins, but they are not the real things. Saying that the HH model has ionic channels is like saying that the wooden plane has a helix: there is something we call a “helix”, yes, but functionally it is not a helix, it is a nice-looking piece of wood. Specifically, in the HH model, the sodium gating variable (m) has no biophysical counterpart in the actual sodium channel. The sodium current in the HH model corresponds to something that can be physically measured, but it is described as proportional to the third power of gating variable m, only because exponent 3 was the best fit to their data. We call it “gating” variable only because it is part of a story in which it is a gating variable: the story that there are three independent gates that must all be open for the channel to be open. It is an attractive story, but we now know that this is not what happens with the sodium channel. So the model is consistent with a story in which there is a neuron with sodium channels, but the story is not an accurate description of reality. We might call this “wooden plane realism”.

The second of type of answer is more scientific in its expression. However, it is a bit ambiguous. What Hodgkin and Huxley proved is that their model was an accurate description of the electrical behavior of a giant squid axon, which was space-clamped with a metal wire. But when we claim that the HH model is realistic, we mean something more general than that. We mean that the same “kind” of model would successfully account for electrical behavior of other neurons. It would not be exactly the same model, because parameters and ionic channels would be different, and would have to be properly adjusted. So in fact it is rather the HH theory or formalism that is meant to be more realistic. However, for a given neuron, the HH “model” is only more realistic if the structure and parameters of the model are properly adjusted for that given neuron.

These remarks touch on several epistemological concepts that have been described by Karl Popper (The logic of scientific discovery, 1935). The first one is the notion of “empirical content” of a theory, which is defined as the set of possible falsifiers of the theory. In short, for a model, it is the type of (non-tautological) predictions that a model can make. For example, the integrate-and-fire model can make predictions about the membrane potential and the spike times, as a function of the input current. The HH model can additionally make predictions about the sodium and potassium currents. This is just about the logical structure of the models, in their articulation with empirical data, not about whether the models are accurate or not. We can consider greater empirical content as a more satisfying way to rephrase the idea that the HH model is more realistic because it “has” ionic channels. But it is a mistake to identify realism with empirical content: a theory can have a very large empirical content and make predictions that turn out to be all completely wrong.

Related to this notion is the “levels of universality”. Consider these two statements (taken from Popper): all orbits of planets are ellipses; all orbits of heavenly bodies are ellipses. The second statement is more universal, because planets are heavenly bodies. So in this sense it is a better theory. HH theory has this quality of being quite universal: it is meant to apply to spiking and non-spiking neurons, for example.

Finally, a theory can be characterized by its “degree of precision”. Taking again an example from Popper: all orbits of planets are circles; all orbits of planets are ellipses. Independently of the empirical validity of these two statements, the first one is more precise than the second one, because all circles are ellipses. Applied to models, this is related to the number of parameters that are left unspecified. For example, multicompartmental models have a greater empirical content than single-compartment models, because they can make predictions about membrane potential at different locations on the dendritic tree. However, they are not necessarily more realistic because they are less precise: there are many unspecified parameters, and the additional empirical content is only accurate if these parameters are properly set.

So in fact there are two aspects of realism that can be discussed about models. One has to do with the logical structure of the model: what cases it is meant to apply to (empirical content, universality), how precise it is in its predictions (precision); in other words: the ambition of the model. On this dimension, one seeks models with greater universality, greater empirical content, greater precision. Another way to phrase it is to say that a useful model is one that has many opportunities to be wrong. It is less easy than we might think to compare HH and IF models on this dimension: on one hand the HH model is more universal, but on the other hand it is less precise than the IF model (for example, a HH model does not necessarily spike).

This first aspect has nothing to do with how accurate the model is, with respect to empirical observations. It only has to do with the logical structure of the model. The second aspect has to do with empirical validity: how accurate the model predictions are. For example, we could well imagine that a phenomenological model produces more accurate predictions than a biophysical model, which has a greater empirical content. In this case the biophysical model makes more predictions, but they do not match empirical observations as well as the phenomenological model. Which model is more realistic?

What is computational neuroscience? (XIX) Does the brain process information?

A general phrase that one reads very often about the brain in the context of perception is that it “processes information”. I have already discussed the term “information”, which is ambiguous and misleading. But here I want to discuss the term “process”. Is it true that the brain is in the business of “information processing”?

“Processing” refers to a procedure that takes something and turns it into something else by a sequence of operations, for example trees into paper. So the sentence implies that what the brain is doing is transforming things into other things. For example, it transforms the image of a face into the identity of the face. The coding paradigm, and more generally the information-processing paradigm, relies on this view.

I will take a concrete example. Animals can localize sounds, based on some auditory cues such as the level difference between the two ears. In the information processing view, what sound localization means is a process that takes a pair of acoustic signals and turns it into a value representing the direction of the sound source. However, this not literally what an animal does.

Let us take a cat. The cat lives and, most of the time, does nothing. Through its ears, it receives a continuous acoustic flow. This flow is transduced into electrical currents, which triggers some activity in the brain, that is, electrical events happening. At some moment in time, a mouse scratches the ground for a second, and the cat turns its eyes towards the source, or perhaps crawls to the mouse. During an extended period of time, the mouse is there in the world, and its location exists as a stable property. What the cat “produces”, on the other hand, is a discrete movement with properties that one can relate to the location of the mouse. Thus, sound localization behavior is characterized by discrete events that occur in a continuous sensory flow. Behavior is not adequately described as a transformation of things into things, because behavior is an event, not a thing: it happens.

The same remark applies to neurons. While a neuron is a thing that exists, a spike is an event that happens. It is a transient change in electrical properties that triggers changes in other neurons. As the terms “neural activity” clearly suggest, a spike is not a “thing” but an event, an action on other neurons or muscles. But the notion of information processing implies that neural activity is actually the end result of a process rather than the process itself. There is a confusion between things and events. In a plant that turns trees into paper, trees and papers are the things that are transformed; the action of cutting trees is not one of these things that are transformed. Yet this is what the information processing metaphor says about neural activity.

There are important practical implications for neural models. Traditionally, these models follow the information-processing paradigm. There is an input to the model, for example a pair of acoustical signals, and there is an output, for example an estimate of sound location (I have worked on this kind model myself, see e.g. Goodman & Brette, PLoS Comp Biol 2010). The estimate is generally calculated from the activity of the neurons over the course of the simulation, which corresponds to the time of the sound. For example, one could select the neuron with the maximum firing rate and map its index to location; or one could compute estimate based on population averages, etc. In any case, there is a well-defined input corresponding to a single sound event, and a single output value corresponding to the estimated location.

Now try to embed this kind of model into a more realistic scenario. There is a continuous acoustic flow. Sounds are presented at various locations in sequence, with silent gaps between them. The model must estimate the locations of these sounds. We have a first problem, which is that the model produces estimates based on total activity over time, and this is clearly not going to work here since there is a sequence of sounds. The model could either produce a continuous estimate of source location (the equivalent of continuously pointing to the source), or it could produce an estimate of source location at specific times (the equivalent of making a discrete movement to the source), for example when the sounds stop. In either case, what is the basis for the estimate, since it cannot be the total activity any more? If it is a continuous estimate, how can it be a stable value if neurons have transient activities? More generally, how can the continuous flow of neural activity produce a discrete movement to a target position?

Thus, sound localization behavior is more than a mapping between pairs of signals and direction estimates. Describing perception as “information processing” entails the following steps: a particular time interval of sensory flow is selected and considered as a thing (rather than a flow of events); a particular set of movements is considered and some of its properties are extracted (e.g. direction); what the brain does is described as the transformation of the first thing into the second thing. Thus, it is an abstract construction by an external observer.

Let me summarize this post and the previous one. What is wrong about “information processing”? Two things are wrong. First (previous post), the view that perception is the transformation of information of some kind into information of another kind is self-contradictory, because a signal can only be considered “information” with respect to a perceptual system. This view of perception therefore proposes that there are things to be perceived by something else than the perceptual system. Second (this post), “processing” is the wrong term because actions produced by the brain are not things but events: it is true at the scale of the organism (behavior) and it is true at the scale of neurons (spikes). Both behavior and causes of behavior are constituted by events, not things. It is also true of the mind (phenomenal consciousness). A thing can be transformed into another thing; an event happens.

What is computational neuroscience? (XVIII) Representational approaches in computational neuroscience

Computational neuroscience is the science of how the brain “computes”: how it recognizes faces or identifies words in speech. In computational neuroscience, standard approaches to perception are representational: they describe how neural networks represent in their firing some aspect of the external world. This means that a particular pattern of activity is associated to a particular face. But who makes this association? In the representational approach, it is the external observer. The approach only describes a mapping between patterns of pixels (say) and patterns of neural activity. The key step, of relating the pattern of neural activity to a particular face (which is in the world, not in the brain), is done by the external observer. How then is this about perception?

This is an intrinsic weakness of the concept of a “representation”: a representation is something (a painting, etc) that has a meaning for some observer, it is not about how this meaning is formed. Ultimately, it does not say much about perception, because it simply replaces the problem of how patterns of photoreceptor activity lead to perception by the problem of how patterns of neural activity lead to perception.

A simple example is the neural representation of auditory space. There are neurons in the auditory brainstem whose firing is sensitive to the direction of a sound source. One theory proposes that the sound's direction is signaled by the identity of the most active neuron (the one that is “tuned” to that direction). Another one proposes that it is the total firing rate of the population, which covaries with direction, that indicates sound direction. Some other theory considers that sound direction is computed as a “population vector”: each neuron codes for direction, and is associated a vector oriented in that direction, with a magnitude equal to its firing rate; the population vector is sum of all vectors.

Implicit in these representational theories is the idea that some other part of the brain “decodes” the neural representation into sound's direction, which ultimately leads to perception and behavior. However, this part is left unspecified in the model: neural models stop at the representational level, and the decoding is done by the external observer (using some formula). But the postulate of a subsequent neural decoder is problematic. Let us assume there is one. It takes the “neural representation” and transforms it into the target quantity, which is sound direction. But the output of a neuron is not a direction, it is a firing pattern or rate that can perhaps be interpreted as a direction. So how is sound direction represented in the output of the neural decoder? It appears that the decoder faces the same conceptual problem, which is that the relationship between output neural activity and the actual quantity in the world (sound direction) has to be interpreted by the external observer. In other words, the output is still a representation. The representational approach leads to an infinite regress.

Since neurons are in the brain and things (sound sources) are in the world, the only way to avoid an external “decoding” stage that relates the two is to include both the world and the brain in the perceptual model. In the example above, this means that, to understand how neurons estimate the direction of a sound source, one would not look for the “neural representation” of sound sources but for neural mechanisms that, embedded in an environment, lead to some appropriate orienting behavior. In other words, neural models of perception are not complete without an interaction with the world (i.e., without action). In this new framework, “neural representations” become a minor issue, one for the external observer looking at neurons.