What is computational neuroscience? (XXXIV) Is the brain a computer (2)

In a previous post, I argued that the way the brain works is not algorithmic, and therefore it is not a computer in the common sense of the term. This contradicts a popular view in computational neuroscience that the brain is a kind of a computer that implements algorithms. That view comes from formal neural network theory, and the argumentation goes as follows. Formal neural networks can implement any computable function, which is a function that can be implemented by an algorithm. Thus the brain can implement algorithms for computable functions, and therefore is by definition a computer. There are multiple errors in this reasoning. The most salient error is a semantic drift on the concept of algorithm, the second major error is a confusion on what a computer is.

Algorithms

A computable function is a function that can be implemented by an algorithm. But the converse “if a function is computable, then whatever implements this function runs an algorithm” is not true. To see this, we need to be a bit more specific about what is meant by “algorithm” and “computable function”.

Loosely speaking, an algorithm is simply a set of explicit instructions to solve a problem. A cooking recipe is an algorithm in this sense. For example, to cook pasta: put water in a pan; heat up; when water boils, put pasta; wait for 10 minutes. The execution of this algorithm occurs in continuous time in a real environment. But what is algorithmic about this description is the discrete sequential flow of instructions. Water boiling itself is not algorithmic, the high-level instructions are: “when condition A is true (water boils), then do B (put pasta)”. Thus, when we speak of algorithms, we must define what is considered as elementary instructions, that is, what is beneath the algorithmic level (water boils, put pasta).

The textbook definition of algorithm in computer science is: "a sequence of computational steps that transform the input into the output." (Cormen et al., Introduction to algorithms; possibly the most used textbook on the subject). Computability is a way to formalize the notion of algorithm for functions of integers (in particular logical functions). To formalize it, one needs to specify what is considered an elementary instruction. Thus, computability does not formalize the loose notion of algorithm above, i.e, any recipe to calculate something, for otherwise any function would be computable and the concept would be empty (to calculate f(x), apply f to x). A computable function is a function that can be calculated by a Turing machine, or equivalently, which can be generated by a small set of elementary functions on integers (with composition and recursion). Thus, an algorithm in the sense of computability theory is a discrete-time sequence of arithmetic and logical operations (and recursion). Note that this readily extend to any countable alphabet instead of integers, and of course you can replace arithmetic and logical operations with higher-order instructions, as long as they are themselves computable (ie a high-level programming language). But it is not any kind of specification of how to solve a problem. For example, there are various algorithms to calculate pi. But we could also calculate pi by drawing a circle, measuring both the diameter and the perimeter, then dividing perimeter by diameter. This is not an algorithm in the sense of computability theory. It could be called an algorithm in the broader sense, but again note that what is algorithmic about it is the discrete structure of the instructions.

Thus, a device could calculate a computable function using an algorithm in the strict sense of computability, or in the broader sense (cooking recipe), or in a non-algorithmic way (i.e., without any discrete structure of instructions). In any case, what the brain or any device manages to do bears no relation with how it does it.

As pointed out above, what is algorithmic about a description of how something works is the discrete structure (first do A; if B is true, then do C, etc). If we removed this condition, then we would be left with the more general concept of model, not algorithm: a description of how something works. Thus, if we want to say anything specific by claiming that the brain implements algorithms, then we must insist on the discrete-time structure (steps). Otherwise, we are just saying that the brain has a model.

Now that we have more precisely defined what an algorithm is, let us examine whether the brain might implement algorithms. Clearly, it does not literally implement algorithms in the narrow sense of computability theory, i.e., with elementary operations on integers and recursion. But could it be that it implements algorithms in the broader sense? To get some perspective, consider the following two physical systems:

(A) are dominoes, (B) is a tent (illustration taken from my essay “Is coding a relevant metaphor for the brain?”). Both are physical systems that interact with an environment, in particular which can be perturbed by mechanical stimuli. The response of dominoes to mechanical stimuli might be likened to an algorithm, but that of the tent cannot. The fact that we can describe unambiguously (with physics) how the tent reacts to mechanical stimuli does not make the dynamics of the tent algorithmic, and the same is true of the brain. Formal neural networks (e.g. perceptrons or deep learning networks) are algorithmic, but the brain is a priori more like the tent: a set of coupled neurons that interact in continuous time, together and with the environment, with no evident discrete structure similar to an algorithm. As argued above, a specification of how these real neural networks work and solve problems is not an algorithm: it’s a model – unless we manage to map the brain’s dynamics to the discrete flow of an algorithm.

Computers

Thus, if a computer is something that solves problems by running algorithms, then the brain is not a computer. We may however consider a broader definition: the computer is something that computes, i.e., which is able to calculate computable functions. As pointed out above, this does not require the computer to run algorithms. For example, consider a box with some gas, a heater (input = temperature T) and a pressure sensor (output = P). The device computes the function P = nRT/V by virtue of physical laws, and not by an algorithm.

This box, however, is not a computer. Otherwise, any physical system would be called a computer. To be called a computer, the device should be able to implement any computable function. But what does it mean exactly? To run an arbitrary computable function, some parameters of the device need to be appropriately adjusted. Who adjusts these parameters and how? If we do not specify how this adjustment is being made, then the claim that the brain is a computer is essentially empty. It just says that for each function, there is a way to arrange the structure of the brain so that this function is achieved. It is essentially equivalent to the claim that atoms can calculate any computable function, depending on how we arrange them.

To call such a device a computer, we must additionally include a mechanism to adjust the parameters so that it does actually perform a particular computable function. This leads us to the conventional definition of a computer: something that can be instructed via computer programming. The notion of program is central to the definition of computers, whatever form this program takes. A crucial implication is that a computer is a device that is dependent on an external operator for its function. The external operator brings the software to the computer; without the ability to receive software, the device is not a computer.

In this sense, the brain cannot be a computer. We may then consider the following metaphorical extension: the brain is a self-programmed computer. But the circularity in this assertion is problematic. If the program is a result of the program itself, then the “computer” cannot actually implement any computable function, but only those that result from its autonomous functioning. A cat, a mouse, an ant and a human do not actually do the same things, and cannot even in principle do the same tasks.

Finally, is computability theory the right framework to describe the activity of the brain in the first place? It is certainly not the right framework to describe the interaction of a tent with its environment, so why would it be appropriate for the brain, an embodied dynamical system in circular relation with the environment? Computability theory is a theory about functions. But a dynamical system is not a function. You can of course define functions on dynamical systems, even though they do not fully characterize the system. For example, you can define the function that maps the current state to the state at some future time. In the case of the brain, we might want to define a function that maps an external perturbation of the system (i.e. a stimulus) to the state of the system at some future time. However, this is not well defined, because it depends on the state of the system at the time of the perturbation. This problem does not occur with formal neural networks precisely because these are not dynamical systems but mappings. The brain is spontaneously active, whether there is a “stimulus” or not. The very notion of the organism as something that responds to stimuli is the most naïve version of behaviorism. The organism has an endogenous activity and a circular relation to its environment. Consider for example central pattern generators: these are rhythmic patterns produced in the absence of any input. Not all dynamical systems can be framed into computability theory, and in fact most of them, including the brain, cannot because they are not mappings.

Conclusion

As I have argued in my essay on neural coding, there are two core problems with the computer metaphor of the brain (it should be clear by now that this is a metaphor and not a property). One is that it tries to match two causal structures that are totally incongruent, just like dominoes and a tent. The other is that the computer metaphor, just as the coding metaphor, implicitly assumes an external operator – who programs it / interprets the code. Thus, what these two metaphors fundamentally miss is the epistemic autonomy of the organism.

What is computational neuroscience? (XXXIII) The interactivist model of cognition

The interactivist model of cognition has been developed by Mark Bickhard over the last 40 years or so. It is related to the viewpoints of Gibson and O’Regan, among others. The model is described in a book (Bickhard and Tervenn, 1996) and a more recent review (Bickhard 2008).

It starts with a criticism of what Bickhard calls “encodingism”, the idea that mental representations are constituted by encodings, correspondences between things in the world and symbols (this is very similar to my criticism of the neural coding metaphor, except Bickhard’s angle is cognitive science while mine was neuroscience). The basic argument is that the encoding “crosses the boundary of the epistemic agent”: the perceptual system stands on only one side of the correspondence, so there is no way it can interpret symbols in terms of things in the world since it never has access to things in the world at any point. The interpretation of the symbols in terms of things in the world would require an interpreter, some entity that makes sense of a priori arbitrary symbols. But this was precisely the epistemic problem to be solved, so the interpreter is a homunculus and this is an incoherent view. This is related to the skeptic argument about knowledge: there cannot be valid knowledge since we acquire knowledge by our senses and we cannot step outside of ourselves to check that it is valid. Encodingism fails the skeptic objection. Note that Bickhard refutes neither the possibility of representations nor even the possibility of encodings, but rather the fact that encodings can be foundational of representations. There can be derivative encodings, based on existing representations (for example Morse is a derivative encoding, which presupposes that we know about both letters and dots and dashes).

A key feature that a representational system must have is what Bickhard calls “system-detectable errors”. A representational system must be able to test whether its representations are correct or not. This is not possible in encodingism because the system does not have access to what is being represented (knowledge that cannot be checked is what I called “metaphysical knowledge” in my Subjective physics paper). No learning is possible if there are no system-detectable errors. This is the problem of normativity.

The interactivist model proposes the following solution: representations are anticipations of potential interactions and their expected impact on future states of the systems, or on the future course of processes of the system (this is close to Gibson’s “affordances”). I give an example taken from Subjective physics. Consider a sound source located somewhere in space. What does it mean to know where the sound came from? In the encoding view, we would say that the system has a mapping between the angle of the source and properties of the sounds, and so it infers the source’s angle from the captured sounds. But what can this mean? Is the inferred angle in radians or degrees? Surely radians and degrees cannot make sense for the perceiver and cannot have been learned (this is what I called “metaphysical knowledge”), so in fact the representation cannot actually be in the form of the physical angle of the source. Rather, what it means that the source is at a given position is that (for example) you would expect that moving your eyes in a particular way would make the source appear in your fovea (see more detail about the Euclidean structure of space and related topics in Subjective physics). Thus, the notion of space is a representation of the expected consequences of certain types of actions.

The interactivist model of representations has the desirable property that it has system-detectable errors: a representation can be correct or not, depending on whether the anticipation turns out to be correct or not. Importantly, what is anticipated is internal states, and therefore the representation does not cross the boundary of the epistemic agent. Contrary to standard models of representation, the interactivist model successfully addresses the skeptic argument.

The interactivist model is described at a rather abstract level, often referring to abstract machine theory (states of automata). Thus, it leaves aside the problem of its naturalization: how is it instantiated by the brain? Important questions to address are: what is a ‘state’ of the brain? (in particular given that the brain is a continuously active dynamical system where no “end state” can be identified); how do we cope with its distributed nature, that is, that the epistemic agent is itself constituted of a web of interacting elementary epistemic agents? how are representations built and instantiated?

What is computational neuroscience? (XXXII) The problem of biological measurement (2)

In the previous post, I have pointed out differences between biological sensing and physical measurement. A direct consequence is that it is not so straightforward to apply the framework of control theory to biological systems. At the level of behavior, it seems clear that animal behavior involves control; it is quite documented in the case of motor control. But this is the perspective of an external observer: the target value, the actual value and the error criterion are identified with physical measurements by an external observer. But how does the organism achieve this control, from its own perspective?

What the organism does not do, at least not directly, is measure the physical dimension and compare it to a target value. Rather, the biological system is influenced by the physical signal and reacts in a way that makes the physical dimension closer to a target value. How? I do not have a definite answer to this question, but I will explore a few possibilities.

Let us first explore a conventional possibility. The sensory neuron encodes the sensory input (eg muscle stretch) in some way; the control system decodes it, and then compares it to a target value. So for example, let us say that the sensory neuron is an integrate-and-fire neuron. If the input is constant, then the interspike interval can be mapped back to the input value. If the input is not constant, it is more complicated but estimates are possible. There are various studies relevant to this problem (for example Lazar (2004); see also the work of Sophie Denève, e.g. 2013). But all these solutions require knowing quite precisely how the input has been encoded. Suppose for example that the sensory neuron adapts with some time constant. Then the decoder needs somehow to de-adapt. But to do it correctly, one needs to know the time constant accurately enough, otherwise biases are introduced. If we consider that the encoder itself learns, e.g. by adapting to signal statistics (as in the efficient coding hypothesis), then the properties of the encoder must be considered unknown by the decoder.

Can the decoder learn to decode the sensory spikes? The problem is it does not have access to the original signal. The key question then is: what could the error criterion be? If the system has no access to the original signal but only streams of spikes, then how could it evaluate an error? One idea is to make an assumption about some properties of the original signal. One could for example assume that the original signal varies slowly, in contrast with the spike train, which is a highly fluctuating signal. Thus we may look for a slow reconstruction of the signal from the spike train; this is in essence the idea of slow feature analysis. But the original signal might not be slowly fluctuating, as it is influenced by the actions of the controller, so it is not clear that this criterion will work.

Thus it is not so easy to think of a control system which would decode the sensory neuron activity into the original signal so as to compare it to a target value. But beyond this technical issue (how to learn the decoder), there is a more fundamental question: why splitting the work into two units (encoder/decoder), if the function of the second one is essentially to undo the work of the first one?

An alternative is to examine the system as a whole. We consider the physical system (environment), the sensory neuron, the actuator, and the interneurons (corresponding to the control system). Instead of seeing the sensory neuron as involved in an act of measurement and communication and the interneurons as involved in an act of interpretation and command, we see the entire system as a distributed dynamical system with a number of structural parameters. In terms of dynamical systems (rather than control), the question becomes: is the target value for the physical dimension an attractive fixed point of this system, or more generally, is there such a fixed point? (as opposed to fluctuations) We can then derive complementary questions:

  • robustness: is the fixed point robust to perturbations, for example changes in properties of the sensor, actuator or environment?
  • optimality: are there ways to adjust the structure of the system so that the firing rate is minimized (for example)?
  • control: can we change the fixed point by an intervention on this system? (e.g. on the interneurons)

Thus, the problem becomes one of designing a spiking system that has an attractive fixed point in the physical dimension, with some desirable properties. Framing the problem in this way does not necessarily require that the physical dimension is explicitly extracted (“decoded”) from the activity of the sensory neuron. If we look at such a system, we might not be able to identify in any of the neurons a quantity that corresponds to the physical signal, or to the target value. Rather, physical signal and target value are to be found in the physical environment, and it is a property of the coupled dynamical system (neurons-environment) that the physical signal tends to approach the target value.

What is computational neuroscience? (XXXI) The problem of biological measurement (1)

We tend to think of sensory receptors (photoreceptors, inner hair cells) or sensory neurons (retinal ganglion cells; auditory nerve fibers) as measuring physical dimensions, for example light intensity or acoustical pressure, or some function of it. The analogy is with physical instruments of measure, like a thermometer or a microphone. This confers a representational quality to the activity of neurons, an assumption that is at the core of the neural coding metaphor. I explain at length why that metaphor is misleading in many ways in an essay (Brette (2018) Is coding a relevant metaphor for the brain?). Here I want to examine more specifically the notion of biological measurement and the challenges it poses.

This notion comes about not only in classical representationalist views, where neural activity is seen as symbols that the brain then manipulates (the perception-cognition-action model, also called sandwich model), but also in alternative views, although it is less obvious. For example, one alternative is to see the brain not as a computer system (encoding symbols, then manipulating them) but as a control system (see Paul Cisek’s behavior as interaction, William Powers’ perceptual control theory, Tim van Gelder’s dynamical view of cognition). In this view, the activity of neurons does not encode stimuli. In fact there is no stimulus per se, as Dewey pointed out: “the motor response determines the stimulus, just as truly as sensory stimulus determines the movement.”.

A simple case is feedback control: the system tries to maintain some input at a target value. To do this, the system must compare the input with an internal value. We could imagine for example something like an idealized version of the stretch reflex: when the muscle is stretched, a sensory feedback triggers a contraction, and we want to maintain muscle length constant. But this apparently trivial task raises a number of deep questions, as more generally the application of control theory to biological systems. I suppose there is a sensor, a neuron that transduces some physical dimension into spike trains, for example the stretch of a muscle. There is also an actuator, which reacts to a spike by a physical action, for example contracting the muscle with a particular time course. I chose a spike-based description not just because it corresponds to the physiology of the stretch reflex, but also because it will illustrate some fundamental issues (which would exist also with graded transduction, but less obviously so).

Now we have a neuron, or a set of neurons, which receive these sensory inputs and send spikes to the actuator. For this discussion, it is not critical that these are actually neurons; we can just consider that there is a system there, and we ask how this system should be designed so as to successfully achieve a control task.

The major issue here is that the control system does not directly deal with the physical dimension. At first sight, we could think this is a minor issue. The physical dimension gets transduced, and we could simply define the target value in the transduced dimension (eg the current). But here we see that the problem is more serious. What the control system deals with is not simply a function of the physical dimension. More accurately, transduction is a nonlinear dynamical system influenced by a physical signal. The physical signal can be constant, for example, while the transduced current decays (adaptation) and the sensory neuron outputs spike trains, i.e., a highly variable signal. This poses a much more serious problem than a simple calibration problem. When the controlled physical value is at the target value, the sensory neuron might be spiking, perhaps not even at a regular rate. The control system should react to that particular kind of signal by not acting, while it should act when the signal deviates from it. But how can the control system identify the target state, or even know whether to act in one or the opposite direction?

Adaptation in neurons is often depicted as an optimization of information transmitted, in line with the metaphor of the day (coding). But the relevant question is: how does the receiver of this “information” knows how the neuron has adapted? Does it have to de-adapt, to somehow be matched to the adaptive process of the encoding neuron? (This problem has to do with the dualistic structure of the neural coding metaphor).

There are additional layers of difficulty. We have first recognized that transduction is not a simple mapping from a physical dimension to a biological (e.g. electrochemical) dimension, but rather a dynamical system influenced by a physical signal. Now this dynamical system depends on the structure of the sensory neuron. It depends for example on the number of ionic channels and their properties, and we know these are highly plastic and indeed quite variable both across time and across cells. This dynamical system also depends on elements of the body, or let’s say more generally the neuron’s environment. For example, the way acoustical pressure is transduced in current by an inner hair cell depends obviously on the acoustical pressure at the eardrum, but that physical signal depends on the shape the ear, which filters sounds. Properties of neurons change with time too, development and aging. Thus, we cannot assume that the dynamical transformation from physical signal to biological signal is a fixed one. Somehow, the control system has to work despite this huge plasticity and the dynamical nature of the sensors.

Let us pause for a moment and outline a number of differences between physical measurements, as with a thermometer, and biological measurements (or “sensing”):

  • The physical meter is calibrated with respect to an external reference, for example 0°C is when water freezes, while 100°C is when it boils. The biological sensor cannot be calibrated with respect to an external reference.
  • The physical meter produces a fixed value for a stationary signal. The biological sensor produces a dynamical signal in response to a stationary signal. More accurately, the biological sensor is a nonlinear dynamical system influenced by the physical signal.
  • The physical meter is meant to be stable, in that the mapping from physical quantity to measurement is fixed. When it is not, this is considered an error. The biological sensor does not have fixed properties. Changes in properties occur in the normal course of life, from birth to death, and some changes in properties are interpreted as adaptations, not errors.

From these differences, we realize that biological sensors do not provide physical measurements in the usual sense. The next question, then, is how can a biological system control a physical dimension with biological sensors that do not act as measurements of that dimension?

What is computational neuroscience? (XXX) Is the brain a computer?

It is sometimes stated as an obvious fact that the brain carries out computations. Computational neuroscientists sometimes see themselves as looking for the algorithms of the brain. Is it true that the brain implements algorithms? My point here is not to answer this question, but rather to show that the answer is not self-evident, and that it can only be true (if at all) at a fairly abstract level.

One line of argumentation is that models of the brain that we find in computational neuroscience (neural network models) are algorithmic in nature, since we simulate them on computers. And wouldn’t it be a sort of vitalistic claim that neural networks cannot be (in principle) simulated on computer?

There is an important confusion in this argument. At a low level, neural networks are modelled biophysically as dynamical systems, in which the temporality corresponds to the actual temporality of the real world (as opposed to the discrete temporality of algorithms). Mathematically, those are typically differential equations, possibly hybrid systems (i.e. coupled by timed pulses), in which time is a continuous variable. Those models can of course be simulated on computer using discretization schemes. For example, we choose a time step and compute the state of the network at time t+dt, from the state at time t. This algorithm, however, implements a simulation of the model; it is not the model that implements the algorithm. The discretization is nowhere to be found in the model. The model itself, being a continuous time dynamical system, is not algorithmic in nature. It is not described as a discrete sequence of operations; it is only the simulation of the model that is algorithmic, and different algorithms can simulate the same model.

If we put this confusion aside, then the claim that neural networks implement algorithms becomes not that obvious. It means that trajectories of the dynamical system can be mapped to the discrete flow of an algorithm. This requires: 1) to identify states with representations of some variables (for example stimulus properties, symbols); 2) to identify trajectories from one state to another as specific operations. In addition to that, for the algorithmic view to be of any use, there should be a sequence of operations, not just one operation (ie, describing the output as a function of the input is not an algorithmic description).

A key difficulty in this identification is temporality: the state of the dynamical system changes continuously, so how can this be mapped to discrete operations? A typical approach is neuroscience is to consider not states but properties of trajectories. For example, one would consider the average firing rate in a population of neurons in a given time window, and the rate of another population in another time window. The relation between these two rates in the context of an experiment would define an operation. As stated above, a sequence of such relations should be identified in order to qualify as an algorithm. But this mapping seems only possible within a feedforward flow; coupling poses a greater challenge for an algorithmic description. No known nervous system, however, has a feedforward connectome.

I am not claiming here that the function of the brain (or mind) cannot possibly be described algorithmically. Probably some of it can be. My point is rather that a dynamical system is not generically algorithmic. A control system, for example, is typically not algorithmic (see the detailed example of Tim van Gelder, What might cognition be if not computation?). Thus a neural dynamical system can only be seen as an algorithm at a fairly abstract level, which can probably address only a restricted subset of its function. It could be that control, which also attaches function to dynamical systems, is a more adequate metaphor of brain function than computation. Is the brain a computer? Given the rather narrow application of the algorithmic view, the reasonable answer should be: quite clearly not (maybe part of cognition could be seen as computation, but not brain function generally).

What is computational neuroscience? (XXIX) The free energy principle

The free energy principle is the theory that the brain manipulates a probabilistic generative model of its sensory inputs, which it tries to optimize by either changing the model (learning) or changing the inputs (action) (Friston 2009; Friston 2010). The “free energy” is related to the error between predictions and actual inputs, or “surprise”, which the organism wants to minimize. It has a more precise mathematical formulation, but the conceptual issues I want to discuss here do not depend on it.

Thus, it can be seen as an extension of the Bayesian brain hypothesis that accounts for action in addition to perception. It shares the conceptual problems of the Bayesian brain hypothesis, namely that it focuses on statistical uncertainty, inferring variables of a model (called “causes”) when the challenge is to build and manipulate the structure of the model. It also shares issues with the predictive coding concept, namely that there is a conflation between a technical sense of “prediction” (expectation of the future signal) and a broader sense that is more ecologically relevant (if I do X, then Y will happen). In my view, these are the main issues with the free energy principle. Here I will focus on an additional issue that is specific of the free energy principle.

The specific interest of the free energy principle lies in its formulation of action. It resonates with a very important psychological theory called cognitive dissonance theory. That theory says that you try to avoid dissonance between facts and your system of beliefs, by either changing the beliefs in a small way or avoiding the facts. When there is a dissonant fact, you generally don’t throw your entire system of beliefs: rather, you alter the interpretation of the fact (think of political discourse or in fact, scientific discourse). Another strategy is to avoid the dissonant facts: for example, to read newspapers that tend to have the same opinions as yours. So there is some support in psychology for the idea that you act so as to minimize surprise.

Thus, the free energy principle acknowledges the circularity of action and perception. However, it is quite difficult to make it account for a large part of behavior. A large part of behavior is directed towards goals; for example, to get food and sex. The theory anticipates this criticism and proposes that goals are ingrained in priors. For example, you expect to have food. So, for your state to match your expectations, you need to seek food. This is the theory’s solution to the so-called “dark room problem” (Friston et al., 2012): if you want to minimize surprise, why not shut off stimulation altogether and go to the closest dark room? Solution: you are not expecting a dark room, so you are not going there in the first place.

Let us consider a concrete example to show that this solution does not work. There are two kinds of stimuli: food, and no food. I have two possible actions: to seek food, or to sit and do nothing. If I do nothing, then with 100% probability, I will see no food. If I seek food, then with, say, 20% probability, I will see food.

Let’s say this is the world in which I live. What does the free energy principle tell us? To minimize surprise, it seems clear that I should sit: I am certain to not see food. No surprise at all. The proposed solution is that you have a prior expectation to see food. So to minimize the surprise, you should put yourself into a situation where you might see food, ie to seek food. This seems to work. However, if there is any learning at all, then you will quickly observe that the probability of seeing food is actually 20%, and your expectations should be adjusted accordingly. Also, I will also observe that between two food expeditions, the probability to see food is 0%. Once this has been observed, surprise is minimal when I do not seek food. So, I die of hunger. It follows that the free energy principle does not survive Darwinian competition.

Thus, either there is no learning at all and the free energy principle is just a way of calling predefined actions “priors”; or there is learning, but then it doesn’t account for goal-directed behavior.

The idea to act so as to minimize surprise resonates with some aspects of psychology, like cognitive dissonance theory, but that does not constitute a complete theory of mind, except possibly of the depressed mind. See for example the experience of flow (as in surfing): you seek a situation that is controllable but sufficiently challenging that it engages your entire attention; in other words, you voluntarily expose yourself to a (moderate amount of) surprise; in any case certainly not a minimum amount of surprise.

What is computational neuroscience? (XXVIII)The Bayesian brain

Our sensors give us an incomplete, noisy, and indirect information about the world. For example, estimating the location of a sound source is difficult because in natural contexts, the sound of interest is corrupted by other sound sources, reflections, etc. Thus it is not possible to know the position of the source with certainty. The ‘Bayesian coding hypothesis’ (Knill & Pouget, 2014) postulates that the brain represents not the most likely position, but the entire probability distribution of the position. It then uses those distributions to do Bayesian inference, for example, when combining different sources of information (say, auditory and visual). This would allow the brain to optimally infer the most likely position. There is indeed some evidence for optimal inference in psychophysical experiments – although there is also some contradicting evidence (Rahnev & Denison, 2018).

The idea has some appeal. The problem is that, by framing perception as a statistical inference problem, it focuses on the most trivial type of uncertainty, statistical uncertainty. It is illustrated by the following quote: “The fundamental concept behind the Bayesian approach to perceptual computations is that the information provided by a set of sensory data about the world is represented by a conditional probability density function over the set of unknown variables”. Implicit in this representation is a particular model, for which variables are defined. Typically, one model describes a particular experimental situation. For example, the model would describe the distribution of auditory cues associated with the position of the sound source. Another situation would be described by a different model, for example one with two sound sources would require a model with two variables. Or if the listening environment is a room and the size of that room might vary, then we would need a model with the dimensions of the room as variables. In any of these cases where we have identified and fixed parametric sources of variation, then the Bayesian approach works fine, because we are indeed facing a problem of statistical inference. But that framework doesn’t fit any real life situation. In real life, perceptual scenes have variable structure, which corresponds to the model in statistical inference (there is one source, or two sources, we are in a room, the second source comes from the window, etc). The perceptual problem is therefore not just to infer the parameters of the model (dimensions of the room etc), but also the model itself, its structure. Thus, it is not possible in general to represent an auditory scene by a probability distribution on a set of parameters, because the very notion of a parameter already assumes that the structure of the scene is known and fixed.

Inferring parameters for a known statistical model is relatively easy. What is really difficult, and is still challenging for machine learning algorithms today, is to identify the structure of a perceptual scene, what constitutes an object (object formation), how objects are related to each other (scene analysis). These fundamental perceptual processes do not exist in the Bayesian brain. This touches on two very different types of uncertainty: statistical uncertainty, variations that can be interpreted and expected in the framework of a model; and epistemic uncertainty,  the model is unknown (the difference has been famously explained by Donald Rumsfeld).

Thus, the “Bayesian brain” idea addresses an interesting problem (statistical inference), but it trivializes the problem of perception, by missing the fact that the real challenge is epistemic uncertainty (building a perceptual model), not statistical uncertainty (tuning the parameters): the world is not noisy, it is complex.

What is computational neuroscience? (XXVII) The paradox of the efficient code and the neural Tower of Babel

A pervasive metaphor in neuroscience is the idea that neurons “encode” stuff: some neurons encode pain; others encode the location of a sound; maybe a population of neurons encode some other property of objects. What does this mean? In essence, that there is a correspondence between some objective property and neural activity: when I feel pain, this neuron spikes; or, the image I see is “represented” in the firing of visual cortical neurons. The mapping between the objective properties and neural activity is the “code”. How insightful is this metaphor?

An encoded message is understandable to the extent that the reader knows the code. But the problem with applying this metaphor to the brain is only the encoded message is communicated, not the code, and not the original message. Mathematically, original message = encoded message + code, but only one term is communicated. This could still work if there were a universal code that we could assume all neurons can read, the “language of neurons”, or if somehow some information about the code could be gathered from the encoded messages themselves. Unfortunately, this is in contradiction with the main paradigm in neural coding theory, “efficient coding”.

The efficient coding hypothesis stipulates that neurons encode signals into spike trains in an efficient way, that is, it uses a code such that all redundancy is removed from the original message while preserving information, in the sense that the encoded message can be mapped back to the original message (Barlow, 1961; Simoncelli, 2003). This implies that with a perfectly efficient code, encoded messages are undistinguishable from random. Since the code is determined on the statistics of the inputs and only the encoded messages are communicated, a code is efficient to the extent that it is not understandable by the receiver. This is the paradox of the efficient code.

In the neural coding metaphor, the code is private and specific to each neuron. If we follow this metaphor, this means that all neurons speak a different language, a language that allows expressing concepts very concisely but that no one else can understand. Thus, according to the coding metaphor, the brain is a Tower of Babel.

Can this work?

What is computational neuroscience? (XXVI) Is optimization a good metaphor of evolution?

Is the brain the result of optimization, and if so, what is the optimization criterion? The popular argument in favor of the optimization view goes as follows. The brain is the result of Darwinian evolution, and therefore is optimally adapted to its environment, ensuring maximum survival and reproduction rates. In this view, to understand the brain is primarily to understand what “adapted” means for a brain, that is, what is the criterion to be optimized.

Previously, I have pointed out a few difficulties in optimality arguments used in neuroscience, in particular the problem of specification (what is being optimized) and the fact that evolution is a history-dependent process, unlike a global optimization procedure. An example of this history dependence is the fascinating case of mitochondria. Mitochondria are organelles in all eukaryotes cells that produce most of the cellular energy in the form of ATP. At this date, the main view is that these organelles are a case of symbiosis: mitochondria were once prokaryote cells that have been captured and farmed. This symbiosis has been selected and conserved through evolution, but optimization does not seem to be the most appropriate metaphor in this case.

Nonetheless, the optimization metaphor can be useful when applied to circumscribed problems that a biological organism might face, for example the energy consumption of action potential propagation. We can claim for example that, everything else being equal, an efficient axon is better than an inefficient one (with the caveat that in practice, not everything else can be made equal). But when applied at the scale of an entire organism, the optimization metaphor starts facing more serious difficulties, which I will discuss now.

When considering an entire organism, or perhaps an organ like the brain, then what criterion can we possibly choose? Recently, I started reading “Guitar Zero” by Gary Marcus. The author points out that learning music is difficult, and argues that the brain has evolved for language, not music. This statement is deeply problematic. What does it mean that the brain has evolved for language? Language does not preexist to speakers, so it cannot be that language was an evolutionary (“optimization”) criterion for the brain, unless we have a more religious view of evolution. Rather, evolutionary change can create opportunities, which might be beneficial for the survival of the species, but there is no predetermined optimization criterion.

Another example is the color visual system of bees (see for example Ways of coloring by Thompson et al.). A case can be made that the visual system of bees is adapted to the color of flowers they are interested in. But conversely, the color of flowers is adapted to the visual system of bees. This is a case of co-evolution, where the “optimization criterion” changes during the evolutionary process.

Thus, the optimization criterion does not preexist to the optimization process, and this makes the optimization metaphor weak.

A possible objection is that there is a preexisting optimization criterion, which is survival or reproduction rate. While this might be correct, it makes the optimization metaphor not very useful. In particular, it applies equally to all living species. The point is, there are species and they are different even though the optimization criterion is the same. Not all have a brain. Thus, optimization does not explain why we have a brain. Species that have a brain have different brains. The nervous system of a nematode is not the same as that of a human, even though they are all equally well adapted, and have evolved for exactly the same amount of time. Therefore, the optimization view does not explain why we speak and nematodes don’t, for example.

The problem is that “fitness” is a completely contextual notion, which depends both on the environment and on the species itself. In a previous post where I discussed an “existentialist” view of evolution, I proposed the following thought experiment. Imagine a very ancient Earth with a bunch of living organisms that do not reproduce but can survive for an indefinite amount of time. By definition, they are adapted since they exist. Then at some point, an accident occurs such that one organism starts multiplying. It multiplies until it occupies the entire Earth and resources become scarce. At this point of saturation, organisms start dying. The probability of dying being the same for both non-reproducing organisms and reproducing ones, at some point there will be only reproducing organisms. Thus in this new environment, reproducing organisms are adapted, whereas non-reproducing ones are not. If we look at the history of evolution, we note that the world of species constantly changes. Species do not appear to converge to some optimal state, because as they evolve, the environment changes and so does the notion of fitness.

In summary, the optimization criterion does not preexist to the optimization process, unless we consider a broad existentialist criterion such as survival, but then the optimization metaphor loses its usefulness.

What is computational neuroscience (XXV) - Are there biological models in computational neuroscience?

Computational neuroscience is the science of how the brain “computes”, that is, how the brain performs cognitive functions such as recognizing a face or walking. Here I will argue that most models of cognition developed in the field, especially as regards sensory systems, are actually not biological models but hybrid models consisting of a neural model together with an abstract model.

First of all, many neural models are not meant to be models of cognition. For example, there are models that are developed to explain the irregular spiking of cortical neurons, or oscillations. I will not consider them. According to the definition above, I categorize them in theoretical neuroscience rather than computational neuroscience. Here I consider for example models of perception, memory, motor control.

An example that I know well is the problem of localizing a sound source from timing cues. There are a number of models, including a spiking neuron model that we have developed (Goodman and Brette, 2010). This model takes as input two sound waves, corresponding to the two monaural sounds produced by the sound source, and outputs the estimated direction of the source. But the neural model, of course, does not output a direction. Rather, the output of the neural model is the activity of a layer of neurons. In the model, we consider that direction is encoded by the identity of the maximally active neuron. In another popular model in the field, direction is encoded by the relative total activity of two groups of neurons (see our comparison of models in Goodman et al. 2013). In all models, there is a final step which maps the activity of neurons to estimated sound location, and this step is not a neural model but an abstract model. This causes big epistemological problems when it comes to assessing and comparing the empirical value of models because a crucial part of the models is not physiological. Some argue that neurons are tuned to sound location; others that population activity varies systematically with sound location. Both are right, and thus none of these observations is a decisive argument to discriminate between the models.

The same is seen in other sensory modalities. The output is the identity of a face; or of an odor; etc. The symmetrical situation occurs in motor control models: this time the abstract model is on the side of the input (mapping from spatial position to neural activity or neural input). Memory models face this situation twice, with abstract models both on the input (the thing to be memorized) and the output (the recall).

Fundamentally, this situation has to do with the fact that most models in computational neuroscience take a representational approach: they describe how neural networks represent in their firing some aspect of the external world. The representational approach requires defining a mapping (called the “decoder”) from neural activity to objective properties of objects, and this mapping cannot be part of the neural model. Indeed, sound location is a property of objects and thus does not belong to the domain of neural activity. So no sound localization model can ever be purely neuronal.

Thus to develop biological models, it is necessary to discard the representational approach. Instead of “encoding” things, neurons control the body; neurons are agents (rather than painters in the representational approach). For example, a model of sound localization should be a model of an orientational response, including the motor command. The model explains not how space is “represented”, but how an animal orients its head (for example) to a sound source. When we try to model an actual behavior, we find that the nature of the problem changes quite significantly. For example, because a particular behavior is an event, neural firing must also be seen as events. In this context, counting spikes and looking at the mutual information between the count and some stimulus property is not very meaningful. What matters is the events that the spikes trigger in the targets (muscles or other neurons). The goal is not to represent the sensory signals but to produce an appropriate behavior. One also realizes that the relation between sensory signals and actions is circular, and therefore cannot be adequately described as “processing”: sensory signals make you turn the head, but if you turn the head, the sensory signals change.

Currently, most models of cognition in computational neuroscience are not biological models. They include neuron models together with abstract models, a necessity stemming from the representational approach. To a make biological model requires including a model of the sensorimotor loop. I believe this is the path that the community should take.