What is computational neuroscience? (XXX) Is the brain a computer?

It is sometimes stated as an obvious fact that the brain carries out computations. Computational neuroscientists sometimes see themselves as looking for the algorithms of the brain. Is it true that the brain implements algorithms? My point here is not to answer this question, but rather to show that the answer is not self-evident, and that it can only be true (if at all) at a fairly abstract level.

One line of argumentation is that models of the brain that we find in computational neuroscience (neural network models) are algorithmic in nature, since we simulate them on computers. And wouldn’t it be a sort of vitalistic claim that neural networks cannot be (in principle) simulated on computer?

There is an important confusion in this argument. At a low level, neural networks are modelled biophysically as dynamical systems, in which the temporality corresponds to the actual temporality of the real world (as opposed to the discrete temporality of algorithms). Mathematically, those are typically differential equations, possibly hybrid systems (i.e. coupled by timed pulses), in which time is a continuous variable. Those models can of course be simulated on computer using discretization schemes. For example, we choose a time step and compute the state of the network at time t+dt, from the state at time t. This algorithm, however, implements a simulation of the model; it is not the model that implements the algorithm. The discretization is nowhere to be found in the model. The model itself, being a continuous time dynamical system, is not algorithmic in nature. It is not described as a discrete sequence of operations; it is only the simulation of the model that is algorithmic, and different algorithms can simulate the same model.

If we put this confusion aside, then the claim that neural networks implement algorithms becomes not that obvious. It means that trajectories of the dynamical system can be mapped to the discrete flow of an algorithm. This requires: 1) to identify states with representations of some variables (for example stimulus properties, symbols); 2) to identify trajectories from one state to another as specific operations. In addition to that, for the algorithmic view to be of any use, there should be a sequence of operations, not just one operation (ie, describing the output as a function of the input is not an algorithmic description).

A key difficulty in this identification is temporality: the state of the dynamical system changes continuously, so how can this be mapped to discrete operations? A typical approach is neuroscience is to consider not states but properties of trajectories. For example, one would consider the average firing rate in a population of neurons in a given time window, and the rate of another population in another time window. The relation between these two rates in the context of an experiment would define an operation. As stated above, a sequence of such relations should be identified in order to qualify as an algorithm. But this mapping seems only possible within a feedforward flow; coupling poses a greater challenge for an algorithmic description. No known nervous system, however, has a feedforward connectome.

I am not claiming here that the function of the brain (or mind) cannot possibly be described algorithmically. Probably some of it can be. My point is rather that a dynamical system is not generically algorithmic. A control system, for example, is typically not algorithmic (see the detailed example of Tim van Gelder, What might cognition be if not computation?). Thus a neural dynamical system can only be seen as an algorithm at a fairly abstract level, which can probably address only a restricted subset of its function. It could be that control, which also attaches function to dynamical systems, is a more adequate metaphor of brain function than computation. Is the brain a computer? Given the rather narrow application of the algorithmic view, the reasonable answer should be: quite clearly not (maybe part of cognition could be seen as computation, but not brain function generally).

What is computational neuroscience? (XXIX) The free energy principle

The free energy principle is the theory that the brain manipulates a probabilistic generative model of its sensory inputs, which it tries to optimize by either changing the model (learning) or changing the inputs (action) (Friston 2009; Friston 2010). The “free energy” is related to the error between predictions and actual inputs, or “surprise”, which the organism wants to minimize. It has a more precise mathematical formulation, but the conceptual issues I want to discuss here do not depend on it.

Thus, it can be seen as an extension of the Bayesian brain hypothesis that accounts for action in addition to perception. It shares the conceptual problems of the Bayesian brain hypothesis, namely that it focuses on statistical uncertainty, inferring variables of a model (called “causes”) when the challenge is to build and manipulate the structure of the model. It also shares issues with the predictive coding concept, namely that there is a conflation between a technical sense of “prediction” (expectation of the future signal) and a broader sense that is more ecologically relevant (if I do X, then Y will happen). In my view, these are the main issues with the free energy principle. Here I will focus on an additional issue that is specific of the free energy principle.

The specific interest of the free energy principle lies in its formulation of action. It resonates with a very important psychological theory called cognitive dissonance theory. That theory says that you try to avoid dissonance between facts and your system of beliefs, by either changing the beliefs in a small way or avoiding the facts. When there is a dissonant fact, you generally don’t throw your entire system of beliefs: rather, you alter the interpretation of the fact (think of political discourse or in fact, scientific discourse). Another strategy is to avoid the dissonant facts: for example, to read newspapers that tend to have the same opinions as yours. So there is some support in psychology for the idea that you act so as to minimize surprise.

Thus, the free energy principle acknowledges the circularity of action and perception. However, it is quite difficult to make it account for a large part of behavior. A large part of behavior is directed towards goals; for example, to get food and sex. The theory anticipates this criticism and proposes that goals are ingrained in priors. For example, you expect to have food. So, for your state to match your expectations, you need to seek food. This is the theory’s solution to the so-called “dark room problem” (Friston et al., 2012): if you want to minimize surprise, why not shut off stimulation altogether and go to the closest dark room? Solution: you are not expecting a dark room, so you are not going there in the first place.

Let us consider a concrete example to show that this solution does not work. There are two kinds of stimuli: food, and no food. I have two possible actions: to seek food, or to sit and do nothing. If I do nothing, then with 100% probability, I will see no food. If I seek food, then with, say, 20% probability, I will see food.

Let’s say this is the world in which I live. What does the free energy principle tell us? To minimize surprise, it seems clear that I should sit: I am certain to not see food. No surprise at all. The proposed solution is that you have a prior expectation to see food. So to minimize the surprise, you should put yourself into a situation where you might see food, ie to seek food. This seems to work. However, if there is any learning at all, then you will quickly observe that the probability of seeing food is actually 20%, and your expectations should be adjusted accordingly. Also, I will also observe that between two food expeditions, the probability to see food is 0%. Once this has been observed, surprise is minimal when I do not seek food. So, I die of hunger. It follows that the free energy principle does not survive Darwinian competition.

Thus, either there is no learning at all and the free energy principle is just a way of calling predefined actions “priors”; or there is learning, but then it doesn’t account for goal-directed behavior.

The idea to act so as to minimize surprise resonates with some aspects of psychology, like cognitive dissonance theory, but that does not constitute a complete theory of mind, except possibly of the depressed mind. See for example the experience of flow (as in surfing): you seek a situation that is controllable but sufficiently challenging that it engages your entire attention; in other words, you voluntarily expose yourself to a (moderate amount of) surprise; in any case certainly not a minimum amount of surprise.

What is computational neuroscience? (XXVIII)The Bayesian brain

Our sensors give us an incomplete, noisy, and indirect information about the world. For example, estimating the location of a sound source is difficult because in natural contexts, the sound of interest is corrupted by other sound sources, reflections, etc. Thus it is not possible to know the position of the source with certainty. The ‘Bayesian coding hypothesis’ (Knill & Pouget, 2014) postulates that the brain represents not the most likely position, but the entire probability distribution of the position. It then uses those distributions to do Bayesian inference, for example, when combining different sources of information (say, auditory and visual). This would allow the brain to optimally infer the most likely position. There is indeed some evidence for optimal inference in psychophysical experiments – although there is also some contradicting evidence (Rahnev & Denison, 2018).

The idea has some appeal. The problem is that, by framing perception as a statistical inference problem, it focuses on the most trivial type of uncertainty, statistical uncertainty. It is illustrated by the following quote: “The fundamental concept behind the Bayesian approach to perceptual computations is that the information provided by a set of sensory data about the world is represented by a conditional probability density function over the set of unknown variables”. Implicit in this representation is a particular model, for which variables are defined. Typically, one model describes a particular experimental situation. For example, the model would describe the distribution of auditory cues associated with the position of the sound source. Another situation would be described by a different model, for example one with two sound sources would require a model with two variables. Or if the listening environment is a room and the size of that room might vary, then we would need a model with the dimensions of the room as variables. In any of these cases where we have identified and fixed parametric sources of variation, then the Bayesian approach works fine, because we are indeed facing a problem of statistical inference. But that framework doesn’t fit any real life situation. In real life, perceptual scenes have variable structure, which corresponds to the model in statistical inference (there is one source, or two sources, we are in a room, the second source comes from the window, etc). The perceptual problem is therefore not just to infer the parameters of the model (dimensions of the room etc), but also the model itself, its structure. Thus, it is not possible in general to represent an auditory scene by a probability distribution on a set of parameters, because the very notion of a parameter already assumes that the structure of the scene is known and fixed.

Inferring parameters for a known statistical model is relatively easy. What is really difficult, and is still challenging for machine learning algorithms today, is to identify the structure of a perceptual scene, what constitutes an object (object formation), how objects are related to each other (scene analysis). These fundamental perceptual processes do not exist in the Bayesian brain. This touches on two very different types of uncertainty: statistical uncertainty, variations that can be interpreted and expected in the framework of a model; and epistemic uncertainty,  the model is unknown (the difference has been famously explained by Donald Rumsfeld).

Thus, the “Bayesian brain” idea addresses an interesting problem (statistical inference), but it trivializes the problem of perception, by missing the fact that the real challenge is epistemic uncertainty (building a perceptual model), not statistical uncertainty (tuning the parameters): the world is not noisy, it is complex.

What is computational neuroscience? (XXVII) The paradox of the efficient code and the neural Tower of Babel

A pervasive metaphor in neuroscience is the idea that neurons “encode” stuff: some neurons encode pain; others encode the location of a sound; maybe a population of neurons encode some other property of objects. What does this mean? In essence, that there is a correspondence between some objective property and neural activity: when I feel pain, this neuron spikes; or, the image I see is “represented” in the firing of visual cortical neurons. The mapping between the objective properties and neural activity is the “code”. How insightful is this metaphor?

An encoded message is understandable to the extent that the reader knows the code. But the problem with applying this metaphor to the brain is only the encoded message is communicated, not the code, and not the original message. Mathematically, original message = encoded message + code, but only one term is communicated. This could still work if there were a universal code that we could assume all neurons can read, the “language of neurons”, or if somehow some information about the code could be gathered from the encoded messages themselves. Unfortunately, this is in contradiction with the main paradigm in neural coding theory, “efficient coding”.

The efficient coding hypothesis stipulates that neurons encode signals into spike trains in an efficient way, that is, it uses a code such that all redundancy is removed from the original message while preserving information, in the sense that the encoded message can be mapped back to the original message (Barlow, 1961; Simoncelli, 2003). This implies that with a perfectly efficient code, encoded messages are undistinguishable from random. Since the code is determined on the statistics of the inputs and only the encoded messages are communicated, a code is efficient to the extent that it is not understandable by the receiver. This is the paradox of the efficient code.

In the neural coding metaphor, the code is private and specific to each neuron. If we follow this metaphor, this means that all neurons speak a different language, a language that allows expressing concepts very concisely but that no one else can understand. Thus, according to the coding metaphor, the brain is a Tower of Babel.

Can this work?

What is computational neuroscience? (XXVI) Is optimization a good metaphor of evolution?

Is the brain the result of optimization, and if so, what is the optimization criterion? The popular argument in favor of the optimization view goes as follows. The brain is the result of Darwinian evolution, and therefore is optimally adapted to its environment, ensuring maximum survival and reproduction rates. In this view, to understand the brain is primarily to understand what “adapted” means for a brain, that is, what is the criterion to be optimized.

Previously, I have pointed out a few difficulties in optimality arguments used in neuroscience, in particular the problem of specification (what is being optimized) and the fact that evolution is a history-dependent process, unlike a global optimization procedure. An example of this history dependence is the fascinating case of mitochondria. Mitochondria are organelles in all eukaryotes cells that produce most of the cellular energy in the form of ATP. At this date, the main view is that these organelles are a case of symbiosis: mitochondria were once prokaryote cells that have been captured and farmed. This symbiosis has been selected and conserved through evolution, but optimization does not seem to be the most appropriate metaphor in this case.

Nonetheless, the optimization metaphor can be useful when applied to circumscribed problems that a biological organism might face, for example the energy consumption of action potential propagation. We can claim for example that, everything else being equal, an efficient axon is better than an inefficient one (with the caveat that in practice, not everything else can be made equal). But when applied at the scale of an entire organism, the optimization metaphor starts facing more serious difficulties, which I will discuss now.

When considering an entire organism, or perhaps an organ like the brain, then what criterion can we possibly choose? Recently, I started reading “Guitar Zero” by Gary Marcus. The author points out that learning music is difficult, and argues that the brain has evolved for language, not music. This statement is deeply problematic. What does it mean that the brain has evolved for language? Language does not preexist to speakers, so it cannot be that language was an evolutionary (“optimization”) criterion for the brain, unless we have a more religious view of evolution. Rather, evolutionary change can create opportunities, which might be beneficial for the survival of the species, but there is no predetermined optimization criterion.

Another example is the color visual system of bees (see for example Ways of coloring by Thompson et al.). A case can be made that the visual system of bees is adapted to the color of flowers they are interested in. But conversely, the color of flowers is adapted to the visual system of bees. This is a case of co-evolution, where the “optimization criterion” changes during the evolutionary process.

Thus, the optimization criterion does not preexist to the optimization process, and this makes the optimization metaphor weak.

A possible objection is that there is a preexisting optimization criterion, which is survival or reproduction rate. While this might be correct, it makes the optimization metaphor not very useful. In particular, it applies equally to all living species. The point is, there are species and they are different even though the optimization criterion is the same. Not all have a brain. Thus, optimization does not explain why we have a brain. Species that have a brain have different brains. The nervous system of a nematode is not the same as that of a human, even though they are all equally well adapted, and have evolved for exactly the same amount of time. Therefore, the optimization view does not explain why we speak and nematodes don’t, for example.

The problem is that “fitness” is a completely contextual notion, which depends both on the environment and on the species itself. In a previous post where I discussed an “existentialist” view of evolution, I proposed the following thought experiment. Imagine a very ancient Earth with a bunch of living organisms that do not reproduce but can survive for an indefinite amount of time. By definition, they are adapted since they exist. Then at some point, an accident occurs such that one organism starts multiplying. It multiplies until it occupies the entire Earth and resources become scarce. At this point of saturation, organisms start dying. The probability of dying being the same for both non-reproducing organisms and reproducing ones, at some point there will be only reproducing organisms. Thus in this new environment, reproducing organisms are adapted, whereas non-reproducing ones are not. If we look at the history of evolution, we note that the world of species constantly changes. Species do not appear to converge to some optimal state, because as they evolve, the environment changes and so does the notion of fitness.

In summary, the optimization criterion does not preexist to the optimization process, unless we consider a broad existentialist criterion such as survival, but then the optimization metaphor loses its usefulness.

What is computational neuroscience (XXV) - Are there biological models in computational neuroscience?

Computational neuroscience is the science of how the brain “computes”, that is, how the brain performs cognitive functions such as recognizing a face or walking. Here I will argue that most models of cognition developed in the field, especially as regards sensory systems, are actually not biological models but hybrid models consisting of a neural model together with an abstract model.

First of all, many neural models are not meant to be models of cognition. For example, there are models that are developed to explain the irregular spiking of cortical neurons, or oscillations. I will not consider them. According to the definition above, I categorize them in theoretical neuroscience rather than computational neuroscience. Here I consider for example models of perception, memory, motor control.

An example that I know well is the problem of localizing a sound source from timing cues. There are a number of models, including a spiking neuron model that we have developed (Goodman and Brette, 2010). This model takes as input two sound waves, corresponding to the two monaural sounds produced by the sound source, and outputs the estimated direction of the source. But the neural model, of course, does not output a direction. Rather, the output of the neural model is the activity of a layer of neurons. In the model, we consider that direction is encoded by the identity of the maximally active neuron. In another popular model in the field, direction is encoded by the relative total activity of two groups of neurons (see our comparison of models in Goodman et al. 2013). In all models, there is a final step which maps the activity of neurons to estimated sound location, and this step is not a neural model but an abstract model. This causes big epistemological problems when it comes to assessing and comparing the empirical value of models because a crucial part of the models is not physiological. Some argue that neurons are tuned to sound location; others that population activity varies systematically with sound location. Both are right, and thus none of these observations is a decisive argument to discriminate between the models.

The same is seen in other sensory modalities. The output is the identity of a face; or of an odor; etc. The symmetrical situation occurs in motor control models: this time the abstract model is on the side of the input (mapping from spatial position to neural activity or neural input). Memory models face this situation twice, with abstract models both on the input (the thing to be memorized) and the output (the recall).

Fundamentally, this situation has to do with the fact that most models in computational neuroscience take a representational approach: they describe how neural networks represent in their firing some aspect of the external world. The representational approach requires defining a mapping (called the “decoder”) from neural activity to objective properties of objects, and this mapping cannot be part of the neural model. Indeed, sound location is a property of objects and thus does not belong to the domain of neural activity. So no sound localization model can ever be purely neuronal.

Thus to develop biological models, it is necessary to discard the representational approach. Instead of “encoding” things, neurons control the body; neurons are agents (rather than painters in the representational approach). For example, a model of sound localization should be a model of an orientational response, including the motor command. The model explains not how space is “represented”, but how an animal orients its head (for example) to a sound source. When we try to model an actual behavior, we find that the nature of the problem changes quite significantly. For example, because a particular behavior is an event, neural firing must also be seen as events. In this context, counting spikes and looking at the mutual information between the count and some stimulus property is not very meaningful. What matters is the events that the spikes trigger in the targets (muscles or other neurons). The goal is not to represent the sensory signals but to produce an appropriate behavior. One also realizes that the relation between sensory signals and actions is circular, and therefore cannot be adequately described as “processing”: sensory signals make you turn the head, but if you turn the head, the sensory signals change.

Currently, most models of cognition in computational neuroscience are not biological models. They include neuron models together with abstract models, a necessity stemming from the representational approach. To a make biological model requires including a model of the sensorimotor loop. I believe this is the path that the community should take.

What is computational neuroscience? (XXIV) - The magic of Darwin

Darwin’s theory of evolution is possibly the most important and influential theory in biology. I am not going to argue against that claim, as I do believe that it is a fine piece of theoretical work, and a great conceptual advance in biology. However, I also find that the explanatory power of Darwin’s theory is often overstated. I recently visited a public exhibition in a museum about Darwin. Nice exhibition overall, but I was a bit bothered by the claim that Darwin’s theory explains the origin, diversity and adaptedness of species, case solved. I have the same feeling when I read in many articles or when I hear in conversations with many scientists that such and such observed feature of living organisms is “explained by evolution”. The reasoning generally goes like this: such biological structure is apparently beneficial to the organism, and therefore the existence of that structure is explained by evolution. As if the emergence of that structure directly followed from Darwin’s account of evolution.

To me, the Darwinian argument is often used as magic, and is mostly void of any content. Replace “evolution” by “God” and you will notice no difference in the logical structure or arguments. Indeed, what the argument actually contains is 1) the empirical observation that the biological organism is apparently adapted to its environment, thanks to the biological feature under scrutiny; 2) the theoretical claim that organisms are adapted to their environment. Note that there is nothing in the argument that actually involves evolution, i.e., the change of biological organisms through some particular process. Darwin is only invoked to back up the theoretical claim that organisms are adapted, but there is nothing specifically about Darwinian evolution that is involved in the argument. It could well be replaced by God, Lamarck or aliens.

What makes me uneasy is that many people seem to think that Darwin’s theory fully explains how biological organisms get to be adapted to their environment. But even in its modern DNA form, it doesn’t. It describes some of the important mechanisms of adaptation, but there is an obvious gap. I am not saying that Darwin’s theory is wrong, but simply that it only addresses part of the problem.

What is Darwin’s theory of evolution? It is based on three simple steps: variation, heredity and selection. 1) Individuals of a given species vary in different respects. 2) Those differences are inherited. In the modern version, new variations occur randomly at this step, and so variations are introduced gradually over generations. 3) Individuals with adapted features survive and reproduce more than others (by definition of “adapted feature”), and therefore spread those features in the population. There is ample empirical evidence for these three claims, and that was the great achievement of Darwin.

The gap in the theory is the nature and distribution of variations. In the space of all possible small variations in structure that one might imagine, do we actually see them in a biological population? Well for one, there are a substantial number of individuals that actually survive for a certain time, so a large number of those variations are not destructive. Since the metaphor of the day is to see the genome as a code for a program, let us consider computer programs. Take a functional program and randomly change 1% of all the bits. What is the probability that 1) the program doesn’t crash, 2) it produces something remotely useful? I would guess that the probability is vanishingly small. You will note that this is not a very popular technique in software engineering. Another way to put it: consider the species of programs that calculate combinatorial functions (say, factorials, binomial coefficients and the like). Surely one might argue that individuals vary by small changes, but conversely, would a small random change in the code typically produce a new combinatorial function?

So it doesn’t follow logically from the three steps of Darwin’s theory that biological organisms should be adapted and survive to changing environments. There is a critical ingredient that is missing: to explain how, in sharp contrast with programs, a substantial fraction of new variations are constructive rather than destructive. In more modern terms, how is it that completely random genetic mutations result in variations in phenotypes that are not arbitrary?

Again I am not saying that Darwin is wrong, but simply that his theory only addresses part of the problem, and that it is not correct to claim that Darwin’s theory fully explains how biological organisms are adapted to their environment (ie, perpetuate themselves). A key point, and a very important research question, is to understand how new variations can be constructive. This can be addressed within the Darwinian framework, as I outlined in a previous post. It leads to a view that departs quite substantially from the program metaphor. A simple remark: the physical elements that are subject to random variation cannot be mapped to the physical elements of structure (e.g. molecules) that define the phenotype, for otherwise those random variations would lead to random (ie mostly destructive) phenotypes. Rather, the structure of the organism must be the result of a self-regulatory process that can be steered by the elements subject to random variation. This is consistent with the modern view of the genome as a self-regulated network of genes, and with Darwin’s theory. But it departs quite substantially from the magic view of evolution theory that is widespread in the biological literature (at least in neuroscience), and instead points to self-regulation and optimization processes operating at the scale of the individual (not of generations).

What is computational neuroscience? (XXIII) On optimality principles in neuroscience

The notion of optimality of biological structures is both quite popular as a type of explanation and highly criticized by many scientists. It is worth understanding why exactly.

In a previous post, I observed that there are different types of explanations, one of which is final cause. Final cause would be for example the minimization of energy in physics or survival and reproduction in biology. Evolutionary theory makes final causes very important in biology. However, I find that such explanations are often rather weak. Such explanations generally take the form: we observe such biological structure because it is optimal in some sense. What exactly is explained or meant here is not always so clear. That a biological structure is optimal means that we consider a set of possible structures, and among that set the observed structure maximizes some criterion. But what is this set of possible structures? Surely not all possible molecular structures. Indeed evolutionary theory does not say that biological organisms are optimal. It says that changes in structure that occur from one generation to the next tend to increase “adaptability” of the organism (there are variations around this theme, such as gene-centered theories). Evolution is a process and biological organisms result from an evolutionary history: they are not absolute optima among all possible molecular structures (otherwise there would not be many species).

To see this, consider the following analogy. From the postulate that people tend to maximize their own interest, I propose the following explanation of social structure: rich people are rich because they want to be rich. Why is this explanation not satisfying? Because both poor and rich people tend to maximize their own interest (by assumption), and yet only the latter are rich. The problem is that we have specified a process that has a particular trend (increasing self interest), but there is no necessity that this process reaches a general optimal of some sort. It is only optimal within a particular individual history. Maybe the poor people have always acted in their own interest, and maybe they are richer than they would be otherwise, but that doesn’t mean they end up rich. In the same way, evolution is a process and it only predicts an optimum within a particular evolutionary history.

Thus, the first remark is that optimality must be understood in the context of a process, both phylogenetic (species history) and ontogenetic (development), not as a global property. Optimality can only be local with respect to that process – after all, there are many species, not a single “optimal” one. That is to say, the fitness criterion (which has to defined more precisely, see below) tends to increase along the process, so that, at equilibrium (assuming there is such a thing – see the theory of punctuated equilibria), the criterion is locally maximized with respect to that process (i.e., cannot increase by the process).

This is the first qualification. There are at least two other types of criticisms that have been raised, which I want to address now, one empirical and one theoretical. The empirical criticism is that biological organisms are not always optimal. The theoretical criticism is that biological organisms do not need to be optimal but only “good enough”, and there might be no evolutionary pressure when organisms are good enough.

I will first address the empirical criticism: biological organisms are not always optimal. First, they are not expected to be, because of the above qualification. But this is not the deep point. This criticism raises the problem of specification: optimal with respect to what? The Darwinian argument only specifies (local) optimality with respect to survival and reproduction. But optimality is generally discussed with respect to some particular aspect of structure or behavior. The problem is that it is generally not obvious at all how the evolutionary fitness criterion should translate to in terms of structure. This is the problem of specification.

For example, I have heard the argument that “people are not optimal”. I take it that it is meant that people are not rational. This is indeed a very well established fact of human psychology. If you haven’t read it yet, I invite you to read “Thinking, fast and slow” by Daniel Kahneman. There are all sorts of cognitive biases that make us humans not very rational in general. To give you a random example, take the “planning fallacy”: if you try to plan the duration of a substantial project (say, building a house or writing a book), then you will almost always underestimate it by an order of magnitude. The reason is that when planning, you imagine a series of steps that are necessary to achieve the project but you don’t imagine all the possible accidents that might happen (say the contractor dies). Any specific accident is very unlikely so you don’t or can’t think about it, but it is very likely that one accident of this type happens, and so you seriously underestimate the completion time. Annoyingly, you still do if you know about the fallacy (at least I still do). This is the problem of epistemic uncertainty (events that are not part of your probabilistic model, as opposed to probabilistic uncertainty, as in rolling a die – see e.g. the Black Swan by Taleb). So humans are not optimal with respect to the rationality criterion. Why is that? Perhaps rationality does not give you an evolutionary advantage. Or perhaps it would by itself, but it would also come with a very large cost in terms of maintaining the adequate structure. Or perhaps it would require such a different brain structure from what humans currently have that no evolutionary step could possibly take us there. Or perhaps it is just impossible to be rational, because the problem of epistemic uncertainty is so fundamental. I am not trying to give an answer, but simply pointing out that the evolutionary argument does not imply that structure and behavior should be optimal with respect to all criteria that seem desirable. Evolutionary “fitness” is a complex notion that encompasses a set of contradicting subcriteria and history effects.

With this important qualification in mind, it should be noted that there are many aspects of biological structure and behavior that have been shown quite convincingly to be optimal or near-optimal with respect to appropriately chosen criteria. It would be sad to discard them, because those explanations give parsimonious accounts of large sets of empirical data. For example, while people are generally not rational or consistent in their reasoning and decisions, when it comes to perceptual or motor tasks it is well documented that humans tend to be near optimal, as accounted by the Bayesian framework. There are of course important qualifications, but it is the case that many aspects of perception are well predicted by the Bayesian framework, at a quantitative (not just qualitative) level (note that I don’t mean perception in the phenomenological sense, but simply in the sense of sensory-related tasks). One big difference with the preceding example is that there is no epistemic uncertainty in these tasks; that is, when perceptual systems have a good statistical model of reality, then they seem to use it in a Bayesian-optimal way.

There are also convincing cases of optimality for biological structures. Robert Rosen discusses some of them in his book “Optimality principles in biology”, in particular the structure of the vascular system (which also seems to apply to lungs). Many geometrical aspects of the vascular system, such as angle and diameter at branching points and even the number of blood vessels can be accounted for by optimality principles with respect to appropriately chosen (but importantly, simple) criteria. This latter point is critical, as is pointed out in that book. Two criteria are simultaneously considered in this case: maximizing the surface of contact and minimizing the resistance to flow (and thus the required energy).

Another well documented case, this time in neurophysiology, is in the geometrical and molecular properties of axons. There is a short paper by Hodgkin that in my opinion shows pretty good optimality reasoning, including some of the qualifications I have mentioned: “The Optimum Density of Sodium Channels in an Unmyelinated Nerve” (1975). He starts by noting that the giant squid axon mediates the escape reflex, and it is critical for survival that this reflex is fast. Therefore speed of conduction along the axon is a good candidate for an optimality analysis: it makes sense, from an evolutionary viewpoint, that the structure of the axon is optimized for speed. Then he tries to predict the density of sodium channels that would maximize speed. As it turns out, this simple question is itself quite complex. He argues that each channel also increases the membrane capacitance, in a way that depends on voltage because the geometrical conformation of the channels is voltage-dependent. Nonetheless he manages to estimate that effect and derives an optimal channel density, which turns out to be of the right order of magnitude (compared with measurements). He also notes that the relation between channel density and velocity has a “flat maximum”, so the exact value might also depend on other aspects than conduction speed.

He then discusses those other aspects at the end of the text. He notes that in other cases (different axons and species), the prediction based on speed does not work so well. His argument then is that speed may simply not be the main relevant criterion in those other cases. It was in the case of the squid axon because it mediates a time-critical escape reflex, but in other cases speed may not be so important and instead energy consumption might be more relevant. Because the squid axon mediates an escape reflex, it very rarely spikes and so energy consumption is presumably not a big issue – compared to being eaten alive because you are too slow. But energy consumption might be a more important criterion for axons that fire more often (say, cortical neurons in mammals). There is indeed a large body of evidence that tends to show that many properties of spike initiation and propagation are adapted for energy efficiency (again, with some qualifications, e.g. fast spiking cells are thought to be less efficient because it seems necessary to fire at high firing rates). There are other structures where axon properties seem to be tuned for isochrony, yet another type of criterion. Isochrony means that spikes produced by different neurons arrive at the same time at a common projection. This seems to be the case in the optic tract (Stanford 1987, “Conduction Velocity Variations Minimizes Conduction Time Differences Among Retinal Ganglion Cell Axons”) and many other structures, for example the binaural system of birds. Thus many aspects of axon structure seem to show a large degree of adaptation, but to a diversity of functional criteria, and it often involves trade-offs. This concludes this discussion of the problem of specification.

The second criticism is not empirical but theoretical: biological organisms do not need to be optimal but only “good enough”, and there might be no evolutionary pressure when organisms are good enough. There is an important sense in which this is true. This is highlighted by Hodgkin in the paper I mentioned: there is a broad range of values for channel density that leads to near-optimal (“good enough”) velocities, and so the exact value might depend on other, less important, criteria, such as energy consumption. But note that this reasoning is still about optimality; simply, it is acknowledged that organisms are not expected to be optimal with respect to any single criterion, since survival depends on many aspects. A related point is that of robustness and redundancy. It appears that there is a lot of redundancy in biological systems. You could lose a kidney and still be fine, and this is also true at cellular level. This again can be thought of in terms of epistemic uncertainty: you could build something that is optimal with respect to a particular model of the world, but it might make that something very fragile to unexpected perturbations, events that are not predicted by the model. Thus redundancy or more generally robustness is desirable, even though it makes organisms suboptimal with respect to any specific model.

But note that we have not left the framework of evolutionary fitness, since we have described redundancy as a desirable feature (as opposed to a random feature). We have simply refined the concept of optimality, which it should be clear now is quite complex, as it must be understood with respect to a constellation of possibly contradictory subcriteria as well as with respect to epistemic uncertainty. But we are still making all those qualifications within the framework of adaptive fitness of organisms. This does not mean that biological organisms can be suboptimal because of lack of strong evolutionary pressure. More precisely, it means that they can be suboptimal with respect to a particular criterion for which there is a lack of strong evolutionary pressure, if the same structure is also subjected to evolutionary pressure on another criterion. These two criteria could be for example conduction speed and energy consumption.

Yet it could be (and has been) argued that even if a structure were subjected to a single criterion, it might still not be optimal with respect to that criterion if evolutionary pressure is weak. For example, it is often stated that spikes of the squid giant axon are not efficient, as in the Hodgkin-Huxley model they are about 3-4 times more energetically expensive than strictly necessary. Because those axons fire very rarely, it makes little difference whether spikes are efficient or not. Considering this fact, spike efficiency is “good enough”.

I find this theoretical argument quite weak. First, let me note that the 3-4 factor applies to the Hodgkin-Huxley model, which was calibrated mainly for action potential shape, but since then it has been refined and the factor is actually smaller (see e.g. work by Benzanilla). But it’s not the important point. From a theoretical viewpoint, even if evolutionary pressure is weak, it is not inexistent. By a simple argument I exposed before, biological organisms must live in environments where resources are scarce, and so there is strong pressure for efficient use of energy and resources in general. Thus even if the giant axon’s spike is a small proportion of that use, there is still some pressure on its efficiency. Squids are a very old species and there seems to be no reason why that pressure might not have applied at some point. But again this is not the most important point. In my view, the most important point is that evolutionary pressure does not apply at the level of individual elements of structure (e.g., on each axon). It applies at the level of genes, which have impact on the entire organism, or at least a large part of it. So the question is not whether the giant squid axon is energy efficient, but rather whether spike conduction along axons is efficient. It is also quite possible that mechanisms related to metabolism are more generic. Thus, while there might be little evolutionary pressure on that particular axon, there certainly is on the set of all axons.

Why then is the squid giant axon inefficient? (I’m assuming it actually is, although to a lesser degree than usually stated) Here is a possible explanation. Efficiency of spike propagation and initiation depends on the properties of ionic channels. In particular, to have spikes that are both fast and efficient you need sodium channels to inactivate very fast. There is likely a limit to how fast it can be, since proteins are discrete structures (which might explain that fast spiking cortical neurons are relatively inefficient). In mammals, fast inactivation is conveyed not by the main protein of the sodium channel (alpha subunit) but by so called beta subunits, which are distinct proteins that modulate channel properties. This comes with a cost, since all those proteins need to be actively maintained (the resting cost). If the neuron spikes often, most of the energetic cost is incurred by spikes. If it spikes very rarely, most of the energetic cost is the resting cost. When that resting cost is taken into account, it might well be that spikes in the squid giant axon is actually quite efficient. The same line of reasoning might explain why such a big axon is not myelinated (or doesn’t show a similar kind of adaptation): myelination decreases the energetic cost of spike propagation for large diameter axons, but it increases the resting cost (you need glial cells producing myelin).

To conclude: optimality principles are important in biology because these are principles that are specific of living organisms (i.e., they are somehow adapted) and they do explain a large body of empirical data. However, these must be applied with care, keeping in mind the problem of specification (optimality with respect to what; i.e. what is actually important for survival), the problem of history effects (optimality is local relative to phylogenetic and ontogenetic processes) and the problem of epistemic uncertainty (leading to robustness principles).

 

Update: I noticed that in Hodgkin’s “Sherrington Lectures” (1964), the author estimates that mean firing frequency of the giant axon in the life of a squid does not exceed a few impulses per minute, which should produce an amount of sodium intake due to spikes of about 1/300 the amount without spike (leak). Thus the cost of spikes is indeed a negligible proportion of the total cost of axon maintenance.

What is computational neuroscience? (XXII) The whole is greater than the sum of its parts

In this post, I want to come back on methodological reductionism, the idea that the right way, or the only way, to understand the whole is to understand the elements that compose it. A classical rebuttal of methodological reductionism is that the “whole is greater than the sum of its parts” (Aristotle). I feel that this argument is often misunderstood, so I have thought of a simple example from biology.

Cells are enclosed by membranes, which are made of lipids. A membrane is a closed surface that defines an interior and an exterior. No part of a membrane is a membrane, because it is not a closed surface. You could study every single lipid molecule that forms a membrane in detail, and you would still have no understanding of what a membrane is, despite the fact that these molecules are all there is in the membrane (ontological reductionism), and that you have a deep understanding of every single one of them. This is because a membrane is defined as a particular relationship between the molecules, and therefore is not contained in or explained by any of them individually.

There is another important epistemological point in this example. You might want to take a “bottom-up” approach to understanding what a membrane is. You would start by looking at a single lipid molecule. Then you could take a larger patch of membrane and study it, building on the knowledge you have learned from the single molecule. Then you could look at larger patches of membrane to understand how they differ from smaller patches; and so on. However, at no stage in this incremental process do you approach a better understanding of what a membrane is, because the membrane only exists in the whole, not in a part of it, even a big part. “Almost a membrane” is not a membrane. In terms of models, a simple model of a cell membrane consisting of only a small number of lipid molecules arranged as a closed surface captures what a membrane is much better than a large-scale model consisting of almost all molecules of the original cell membrane.

This criticism applies in particular to purely data-driven strategies to understand the brain. You could think that the best model of the brain is the one that includes as much detailed empirical information about it as possible. The fallacy here is that no part of the brain is a brain. An isolated cortex in a box, for example, does not think or behave. A slice of brain is also not a brain. Something “close to the brain” is still not a brain. A mouse is a better model of a human than half a human, which is bigger and physically more similar but dead. This is the same problem as for understanding a membrane (a much simpler system!): the methodologically reductionist strategy misses that it is not the elements themselves that make the whole, it is the relationship between the elements. So the key to understand such systems is not to increase the level of detail or similarity, but to capture relevant higher-order principles.

What is computational neuroscience? (XXI) Lewis Carroll and Norbert Wiener on detailed models

The last published novel of Lewis Carroll, Sylvie and Bruno (1893 for the second volume), contains a passage that explains that a high level of detail is not necessarily what you want from a model. I quote it in full:

“What a useful thing a pocket-map is!” I remarked.

“That’s another thing we’ve learned from your Nation,” said Mein Herr, “map-making. But we’ve carried it much further than you. What do you consider the largest map that would be really useful?”

“About six inches to the mile.”

“Only six inches!” exclaimed Mein Herr. “We very soon got to six yards to the mile. Then we tried a hundred yards to the mile. And then came the grandest idea of all! We actually made a map of the country, on the scale of a mile to the mile!”

“Have you used it much?” I enquired.

“It has never been spread out, yet,” said Mein Herr: “the farmers objected: they said it would cover the whole country, and shut out the sunlight! So we now use the country itself, as its own map, and I assure you it does nearly as well.”

In other words: if the model is nearly as complex as the thing it applies to, then it is no more useful than the thing itself. This theme also appears in a 1945 essay by Arturo Rosenblueth and Norbert Wiener, “The Role of Models in Science”:

“The best material model for a cat is another, or preferably the same cat. In other words, should a material model thoroughly realize its purpose, the original situation could be grasped in its entirety and a model would be unnecessary. […] This ideal theoretical model cannot probably be achieved. Partial models, imperfect as they may be, are the only means developed by science for understanding the universe. This statement does not imply an attitude of defeatism but the recognition that the main tool of science is the human mind and that the human mind is finite.”

The last sentence is the most important: a model is not something that is meant to mimic reality; it is something that is constructed by and for the human mind to help it grasp complex aspects of reality.