Revues prédatrices : quel est le problème ?

Un récent article du Monde alerte sur un phénomène qui prend de l’ampleur dans l’édition scientifique : les revues prédatrices (voir aussi l’éditorial). Il s’agit d’éditeurs commerciaux qui publient des articles scientifiques en ligne, contre rémunération, sans aucune éthique scientifique, en particulier en acceptant tous les articles sans qu’ils soient revus par des pairs. De manière similaire, les fausses conférences se multiplient ; des entreprises organisent des conférences scientifiques dans un but purement commercial, sans se soucier de la qualité scientifique.

En réaction, certaines institutions commencent à monter des « listes blanches » de journaux à éviter. C’est compréhensible, puisque le phénomène a un coût important. Mais la réponse néglige le problème fondamental. Il faut se rendre à l’évidence : l’éthique commerciale (recherche du profit) n’est pas compatible avec l’éthique scientifique (recherche de la vérité). Les entreprises dont on parle ne sont pas illégales, à ma connaissance. Elles organisent des conférences qui sont réelles ; elles publient des journaux qui sont réels. Simplement, elles ne se soucient pas de la qualité scientifique, mais de leur profit. On considère cela comme immoral ; mais une entreprise commerciale n’a pas de dimension morale, il s’agit simplement d’une organisation dont le but est de générer du profit. On ne peut s’attendre à ce que les intérêts commerciaux correspondent comme par magie exactement aux intérêts scientifiques.

  1. Le problème de l’édition commerciale

Ceci est vrai aux deux extrémités du spectre de la publication académique : pour les journaux prédateurs comme pour les journaux prestigieux. L’article parle de « fausse science » ; mais la plupart des cas de fraude scientifique ont été révélés dans des journaux prestigieux, pas dans des journaux prédateurs – qui de toutes façons ne sont pas lus par la communauté scientifique (voir par exemple Brembs (2018) pour le lien entre qualité méthodologique et prestige du journal). Pour les journaux commerciaux prestigieux, la stratégie commerciale des éditeurs est non pas de maximiser le nombre d’articles publiés, mais de maximiser le prestige perçu de ces journaux, qui servent ensuite d’appâts pour vendre les collections de journaux de l’éditeur. Autrement dit, c’est une stratégie de marque. Cela passe notamment par une sélection drastique des articles soumis, opérée par des éditeurs professionnels, c’est-à-dire pas par des scientifiques professionnels, sur la base de l’importance perçue des résultats, poussant ainsi une génération de scientifiques à gonfler les prétentions de leurs articles. Cela passe par la promotion auprès des institutions publiques de métriques douteuses comme le facteur d’impact, et plus généralement la promotion d’une mythologie de la publication prestigieuse, à savoir l’idée fausse et dangereuse qu’un article doit être jugé par le prestige du journal dans lequel il est publié, plutôt que par sa valeur scientifique intrinsèque – qui elle est évaluée par la communauté scientifique, pas par un éditeur commercial, ni même par deux scientifiques anonymes. En proposant d’éditer des listes de mauvais journaux, on ne résout pas le problème car l’on adhère implicitement à cette logique perverse.

Il suffit de regarder les marges dégagées par les grandes multinationales de l’édition scientifique pour comprendre que le modèle commercial n’est pas adapté. Pour Elsevier par exemple, les marges sont de l’ordre de 40%. La simple lecture de ce chiffre devrait nous convaincre immédiatement que l’édition scientifique devrait être gérée par des institutions publiques, du moins non commerciales (par exemple des sociétés savantes, comme c’est le cas d’un certain nombre de journaux). Quel est la justification pour faire appel à un opérateur commercial pour gérer un service public, ou n’importe quel service ? La motivation est que la compétition permet de diminuer les coûts et d’améliorer la qualité. Or si les marges sont de 40%, c’est que visiblement la compétition n’opère pas. Pourquoi ? Simplement parce que lorsqu’un scientifique soumet un article, il ne choisit pas le journal en fonction du prix ni même du service rendu (qui est en réalité essentiellement rendu par des scientifiques bénévoles), mais en fonction de la visibilité et du prestige du journal. Il n’y a donc pas de compétition sur les prix. Le pire qui pourrait arriver pour un éditeur commercial est que les articles scientifiques soient jugés à leur valeur intrinsèque plutôt que par le journal dans lequel ils sont publiés, parce qu’alors ce modèle commercial unique s’effondrerait et les journaux seraient en compétition sur les prix et les services qu’ils doivent fournir, comme n’importe quelle autre entreprise commerciale. C’est le pire qui puisse arriver aux éditeurs commerciaux, et le mieux qui puisse arriver à la communauté scientifique. Voilà pourquoi les intérêts commerciaux et scientifiques sont divergents.

Quoi qu’il en soit, il faut se rendre à l’évidence : des marges aussi énormes signifient que le modèle commercial est inefficace. Il faut donc cesser immédiatement de faire appel à des journaux commerciaux. Ce n’est pas très difficile : les institutions publiques sont tout à fait capables de gérer des journaux scientifiques ; il en existe et depuis longtemps. Un exemple récent est eLife, un des journaux les plus innovants actuellement en biologie. Cela ne devrait pas être très étonnant : le cœur de l’activité des journaux, à savoir la relecture des articles, est déjà faite par des scientifiques, y compris chez les éditeurs commerciaux qui font appel à leurs services gratuitement. Cela ne veut pas dire que l’on ne peut pas faire appel à des entreprises privées pour fournir des services, par exemple héberger des serveurs, gérer les sites web, fournir de l’infrastructure. Mais les journaux ne doivent plus appartenir à des sociétés commerciales, dont l’intérêt est de gérer ces journaux comme des marques. L’éthique scientifique n’est pas compatible avec l’éthique commerciale.

Comment faire ? En réalité c'est assez évident. Il s’agit pour les pouvoirs publics d’annuler la totalité des abonnements aux éditeurs commerciaux et de cesser de payer des droits de publication à ces éditeurs. De nos jours, il n’est pas difficile d’avoir accès à la littérature scientifique sans passer par les journaux (par les prépublications ou ‘preprints’ ou simplement en écrivant aux auteurs qui sont généralement ravis que l’on s’intéresse à leurs travaux). L’argent économisé peut être réinvesti en partie dans l’édition scientifique non commerciale.

  1. Le mythe de la revue par les pairs

Je veux maintenant en venir à une question d’épistémologie plus subtile mais fondamentale. Quel est au fond le problème des revues prédatrices ? Clairement, il y a le gaspillage d’argent public. Mais l’article du Monde pointe également des problèmes scientifiques, à savoir le fait que de fausses informations sont propagées, sans avoir été vérifiées. L’éditorial parle en effet de ‘la sacro-sainte « revue par les pairs »’, qui n’est pas effectuée par ces revues. Mais est-ce vraiment le problème fondamental ?

L’idée que ce qui fait la valeur d’un article scientifique est qu’il a été validé par la relecture par les pairs avant publication est un mythe tenace mais néanmoins erroné. Cela est faux d’un point de vue empirique, et d’un point de vue théorique.

D’un point de vue empirique, à tout instant, il existe dans la littérature des conclusions contradictoires à propos d’un grand nombre de sujets, publiées dans des revues traditionnelles. Les cas de fraude récents concernent des articles qui ont pourtant subi une relecture par les pairs. Mais c’est le cas aussi d’une quantité beaucoup plus importantes d’articles non frauduleux, mais dont les conclusions ont été contestées par la suite. L’histoire des sciences est remplie de théories scientifiques contradictoires et coexistantes, d’âpres débats entre scientifiques. Ces débats ont lieu, justement, après publication, et le consensus scientifique se forme généralement assez lentement, pratiquement jamais sur la base d’un seul article (voir par exemple Imre Lakatos en philosophie des sciences, ou Thomas Kuhn). Par ailleurs, les résultats scientifiques sont également souvent diffusés dans la communauté scientifique avant publication formelle ; c’est le cas aujourd’hui avec les prépublications (« preprints » en ligne), mais c’était déjà partiellement le cas auparavant avec les conférences. L’article publié reste la référence parce qu’il fournit des détails précis, notamment méthodologiques, mais la contribution des relecteurs sollicités par les journaux n’est dans la plupart des cas pas essentielle, d’autant que celle-ci n’est généralement pas rendue publique.

D’un point de vue théorique, il n’y a aucune raison que la relecture par les pairs « valide » un résultat scientifique. Il n’y a rien de magique dans la revue par les pairs : simplement deux, parfois trois scientifiques donnent leur avis éclairé sur le manuscrit. Ces scientifiques ne sont pas plus experts que ceux qui vont lire l’article lorsqu’il sera publié (je parle bien sûr de la communauté scientifique et pas du grand public). Le fait qu’un article soit publié dans un journal ne dit pas grand chose en soi de la réception des résultats par la communauté ; lorsqu’un article est rejeté d’un journal, il est resoumis ailleurs. La publication finale n’atteste absolument pas d’un consensus scientifique. Par ailleurs, lorsqu’il s’agit d’études empiriques, les relecteurs n’ont pas en réalité la possibilité de vérifier les résultats, et notamment de vérifier s’il n’y a pas eu de fraude. Tout ce qu’ils peuvent faire, c’est vérifier que les méthodes employées semblent appropriées, et que les interprétations semblent sensées (deux points souvent sujets à débat). Pour valider les résultats (mais pas les interprétations), il faudrait au minimum pouvoir refaire les expériences en question, ce qui suppose le temps et l’équipement nécessaire. Ce travail indispensable est fait (ou tenté), mais il n’est pas fait au moment de la publication, ni commissionné par le journal. Il est fait après publication par la communauté scientifique. Le travail de « vérification » (mot inapproprié car il n’y a pas de vérité absolue en science, ce qui la distingue justement de la religion) est le travail de fond de la communauté scientifique, ce n’est pas le travail ponctuel du journal.

C’est cette idée reçue qu’il faut déconstruire : que le travail de revue interne au journal « valide » d’une certaine manière les résultats scientifiques. Ce n’est pas le cas, cela n’a jamais été le cas, et cela ne peut pas être le cas. La validation scientifique est la nature même de l’entreprise scientifique, qui est un travail collectif et de longue haleine. On ne peut pas lire un article et conclure « c’est vrai »; il faut pour cela l’intégrer dans un ensemble de connaissances scientifiques, confronter l’interprétation à des points de vue différents (car toute interprétation requiert un cadre théorique).

C’est justement cette idée reçue que les journaux prestigieux tentent au contraire de consolider. Il faut y résister. L’antidote est de rendre public et transparent le débat scientifique, qui actuellement reste souvent confiné aux couloirs des laboratoires et des conférences. On prétend que la relecture par les pairs valide les résultats scientifiques, mais ces rapports ne sont la plupart du temps pas publiés ; et quid des rapports non publiés lorsque l’article est rejeté par un journal ? Comment savoir alors ce qu’en pense la communauté ? Il faut au contraire rendre public le débat scientifique. C’est par exemple l’ambition de sites comme PubPeer, qui a mis à jour un certain nombre de fraudes, mais qui peut être utilisé simplement pour le débat scientifique de manière générale. Plutôt que de conditionner la publication à un accord confidentiel de scientifiques anonymes, il faut au contraire inverser ce système : publier l’article (c’est en fait déjà le cas par la prépublication), puis solliciter les avis de la communauté, qui seront également publiés, argumentés, discutés par les auteurs et le reste de la communauté. C’est ainsi que les scientifiques, mais également le plus grand public, pourront obtenir un vision plus juste de la valeur scientifique des articles publiés. La revue par les pairs est un principe fondamental de la science, oui, mais pas celle effectuée dans la confidence par les journaux, celle au contraire effectuée au grand jour et sans limite de temps par la communauté scientifique.

What is computational neuroscience? (XXXII) The problem of biological measurement (2)

In the previous post, I have pointed out differences between biological sensing and physical measurement. A direct consequence is that it is not so straightforward to apply the framework of control theory to biological systems. At the level of behavior, it seems clear that animal behavior involves control; it is quite documented in the case of motor control. But this is the perspective of an external observer: the target value, the actual value and the error criterion are identified with physical measurements by an external observer. But how does the organism achieve this control, from its own perspective?

What the organism does not do, at least not directly, is measure the physical dimension and compare it to a target value. Rather, the biological system is influenced by the physical signal and reacts in a way that makes the physical dimension closer to a target value. How? I do not have a definite answer to this question, but I will explore a few possibilities.

Let us first explore a conventional possibility. The sensory neuron encodes the sensory input (eg muscle stretch) in some way; the control system decodes it, and then compares it to a target value. So for example, let us say that the sensory neuron is an integrate-and-fire neuron. If the input is constant, then the interspike interval can be mapped back to the input value. If the input is not constant, it is more complicated but estimates are possible. There are various studies relevant to this problem (for example Lazar (2004); see also the work of Sophie Denève, e.g. 2013). But all these solutions require knowing quite precisely how the input has been encoded. Suppose for example that the sensory neuron adapts with some time constant. Then the decoder needs somehow to de-adapt. But to do it correctly, one needs to know the time constant accurately enough, otherwise biases are introduced. If we consider that the encoder itself learns, e.g. by adapting to signal statistics (as in the efficient coding hypothesis), then the properties of the encoder must be considered unknown by the decoder.

Can the decoder learn to decode the sensory spikes? The problem is it does not have access to the original signal. The key question then is: what could the error criterion be? If the system has no access to the original signal but only streams of spikes, then how could it evaluate an error? One idea is to make an assumption about some properties of the original signal. One could for example assume that the original signal varies slowly, in contrast with the spike train, which is a highly fluctuating signal. Thus we may look for a slow reconstruction of the signal from the spike train; this is in essence the idea of slow feature analysis. But the original signal might not be slowly fluctuating, as it is influenced by the actions of the controller, so it is not clear that this criterion will work.

Thus it is not so easy to think of a control system which would decode the sensory neuron activity into the original signal so as to compare it to a target value. But beyond this technical issue (how to learn the decoder), there is a more fundamental question: why splitting the work into two units (encoder/decoder), if the function of the second one is essentially to undo the work of the first one?

An alternative is to examine the system as a whole. We consider the physical system (environment), the sensory neuron, the actuator, and the interneurons (corresponding to the control system). Instead of seeing the sensory neuron as involved in an act of measurement and communication and the interneurons as involved in an act of interpretation and command, we see the entire system as a distributed dynamical system with a number of structural parameters. In terms of dynamical systems (rather than control), the question becomes: is the target value for the physical dimension an attractive fixed point of this system, or more generally, is there such a fixed point? (as opposed to fluctuations) We can then derive complementary questions:

  • robustness: is the fixed point robust to perturbations, for example changes in properties of the sensor, actuator or environment?
  • optimality: are there ways to adjust the structure of the system so that the firing rate is minimized (for example)?
  • control: can we change the fixed point by an intervention on this system? (e.g. on the interneurons)

Thus, the problem becomes one of designing a spiking system that has an attractive fixed point in the physical dimension, with some desirable properties. Framing the problem in this way does not necessarily require that the physical dimension is explicitly extracted (“decoded”) from the activity of the sensory neuron. If we look at such a system, we might not be able to identify in any of the neurons a quantity that corresponds to the physical signal, or to the target value. Rather, physical signal and target value are to be found in the physical environment, and it is a property of the coupled dynamical system (neurons-environment) that the physical signal tends to approach the target value.

What is computational neuroscience? (XXXI) The problem of biological measurement (1)

We tend to think of sensory receptors (photoreceptors, inner hair cells) or sensory neurons (retinal ganglion cells; auditory nerve fibers) as measuring physical dimensions, for example light intensity or acoustical pressure, or some function of it. The analogy is with physical instruments of measure, like a thermometer or a microphone. This confers a representational quality to the activity of neurons, an assumption that is at the core of the neural coding metaphor. I explain at length why that metaphor is misleading in many ways in an essay (Brette (2018) Is coding a relevant metaphor for the brain?). Here I want to examine more specifically the notion of biological measurement and the challenges it poses.

This notion comes about not only in classical representationalist views, where neural activity is seen as symbols that the brain then manipulates (the perception-cognition-action model, also called sandwich model), but also in alternative views, although it is less obvious. For example, one alternative is to see the brain not as a computer system (encoding symbols, then manipulating them) but as a control system (see Paul Cisek’s behavior as interaction, William Powers’ perceptual control theory, Tim van Gelder’s dynamical view of cognition). In this view, the activity of neurons does not encode stimuli. In fact there is no stimulus per se, as Dewey pointed out: “the motor response determines the stimulus, just as truly as sensory stimulus determines the movement.”.

A simple case is feedback control: the system tries to maintain some input at a target value. To do this, the system must compare the input with an internal value. We could imagine for example something like an idealized version of the stretch reflex: when the muscle is stretched, a sensory feedback triggers a contraction, and we want to maintain muscle length constant. But this apparently trivial task raises a number of deep questions, as more generally the application of control theory to biological systems. I suppose there is a sensor, a neuron that transduces some physical dimension into spike trains, for example the stretch of a muscle. There is also an actuator, which reacts to a spike by a physical action, for example contracting the muscle with a particular time course. I chose a spike-based description not just because it corresponds to the physiology of the stretch reflex, but also because it will illustrate some fundamental issues (which would exist also with graded transduction, but less obviously so).

Now we have a neuron, or a set of neurons, which receive these sensory inputs and send spikes to the actuator. For this discussion, it is not critical that these are actually neurons; we can just consider that there is a system there, and we ask how this system should be designed so as to successfully achieve a control task.

The major issue here is that the control system does not directly deal with the physical dimension. At first sight, we could think this is a minor issue. The physical dimension gets transduced, and we could simply define the target value in the transduced dimension (eg the current). But here we see that the problem is more serious. What the control system deals with is not simply a function of the physical dimension. More accurately, transduction is a nonlinear dynamical system influenced by a physical signal. The physical signal can be constant, for example, while the transduced current decays (adaptation) and the sensory neuron outputs spike trains, i.e., a highly variable signal. This poses a much more serious problem than a simple calibration problem. When the controlled physical value is at the target value, the sensory neuron might be spiking, perhaps not even at a regular rate. The control system should react to that particular kind of signal by not acting, while it should act when the signal deviates from it. But how can the control system identify the target state, or even know whether to act in one or the opposite direction?

Adaptation in neurons is often depicted as an optimization of information transmitted, in line with the metaphor of the day (coding). But the relevant question is: how does the receiver of this “information” knows how the neuron has adapted? Does it have to de-adapt, to somehow be matched to the adaptive process of the encoding neuron? (This problem has to do with the dualistic structure of the neural coding metaphor).

There are additional layers of difficulty. We have first recognized that transduction is not a simple mapping from a physical dimension to a biological (e.g. electrochemical) dimension, but rather a dynamical system influenced by a physical signal. Now this dynamical system depends on the structure of the sensory neuron. It depends for example on the number of ionic channels and their properties, and we know these are highly plastic and indeed quite variable both across time and across cells. This dynamical system also depends on elements of the body, or let’s say more generally the neuron’s environment. For example, the way acoustical pressure is transduced in current by an inner hair cell depends obviously on the acoustical pressure at the eardrum, but that physical signal depends on the shape the ear, which filters sounds. Properties of neurons change with time too, development and aging. Thus, we cannot assume that the dynamical transformation from physical signal to biological signal is a fixed one. Somehow, the control system has to work despite this huge plasticity and the dynamical nature of the sensors.

Let us pause for a moment and outline a number of differences between physical measurements, as with a thermometer, and biological measurements (or “sensing”):

  • The physical meter is calibrated with respect to an external reference, for example 0°C is when water freezes, while 100°C is when it boils. The biological sensor cannot be calibrated with respect to an external reference.
  • The physical meter produces a fixed value for a stationary signal. The biological sensor produces a dynamical signal in response to a stationary signal. More accurately, the biological sensor is a nonlinear dynamical system influenced by the physical signal.
  • The physical meter is meant to be stable, in that the mapping from physical quantity to measurement is fixed. When it is not, this is considered an error. The biological sensor does not have fixed properties. Changes in properties occur in the normal course of life, from birth to death, and some changes in properties are interpreted as adaptations, not errors.

From these differences, we realize that biological sensors do not provide physical measurements in the usual sense. The next question, then, is how can a biological system control a physical dimension with biological sensors that do not act as measurements of that dimension?

What is computational neuroscience? (XXX) Is the brain a computer?

It is sometimes stated as an obvious fact that the brain carries out computations. Computational neuroscientists sometimes see themselves as looking for the algorithms of the brain. Is it true that the brain implements algorithms? My point here is not to answer this question, but rather to show that the answer is not self-evident, and that it can only be true (if at all) at a fairly abstract level.

One line of argumentation is that models of the brain that we find in computational neuroscience (neural network models) are algorithmic in nature, since we simulate them on computers. And wouldn’t it be a sort of vitalistic claim that neural networks cannot be (in principle) simulated on computer?

There is an important confusion in this argument. At a low level, neural networks are modelled biophysically as dynamical systems, in which the temporality corresponds to the actual temporality of the real world (as opposed to the discrete temporality of algorithms). Mathematically, those are typically differential equations, possibly hybrid systems (i.e. coupled by timed pulses), in which time is a continuous variable. Those models can of course be simulated on computer using discretization schemes. For example, we choose a time step and compute the state of the network at time t+dt, from the state at time t. This algorithm, however, implements a simulation of the model; it is not the model that implements the algorithm. The discretization is nowhere to be found in the model. The model itself, being a continuous time dynamical system, is not algorithmic in nature. It is not described as a discrete sequence of operations; it is only the simulation of the model that is algorithmic, and different algorithms can simulate the same model.

If we put this confusion aside, then the claim that neural networks implement algorithms becomes not that obvious. It means that trajectories of the dynamical system can be mapped to the discrete flow of an algorithm. This requires: 1) to identify states with representations of some variables (for example stimulus properties, symbols); 2) to identify trajectories from one state to another as specific operations. In addition to that, for the algorithmic view to be of any use, there should be a sequence of operations, not just one operation (ie, describing the output as a function of the input is not an algorithmic description).

A key difficulty in this identification is temporality: the state of the dynamical system changes continuously, so how can this be mapped to discrete operations? A typical approach is neuroscience is to consider not states but properties of trajectories. For example, one would consider the average firing rate in a population of neurons in a given time window, and the rate of another population in another time window. The relation between these two rates in the context of an experiment would define an operation. As stated above, a sequence of such relations should be identified in order to qualify as an algorithm. But this mapping seems only possible within a feedforward flow; coupling poses a greater challenge for an algorithmic description. No known nervous system, however, has a feedforward connectome.

I am not claiming here that the function of the brain (or mind) cannot possibly be described algorithmically. Probably some of it can be. My point is rather that a dynamical system is not generically algorithmic. A control system, for example, is typically not algorithmic (see the detailed example of Tim van Gelder, What might cognition be if not computation?). Thus a neural dynamical system can only be seen as an algorithm at a fairly abstract level, which can probably address only a restricted subset of its function. It could be that control, which also attaches function to dynamical systems, is a more adequate metaphor of brain function than computation. Is the brain a computer? Given the rather narrow application of the algorithmic view, the reasonable answer should be: quite clearly not (maybe part of cognition could be seen as computation, but not brain function generally).

What is computational neuroscience? (XXIX) The free energy principle

The free energy principle is the theory that the brain manipulates a probabilistic generative model of its sensory inputs, which it tries to optimize by either changing the model (learning) or changing the inputs (action) (Friston 2009; Friston 2010). The “free energy” is related to the error between predictions and actual inputs, or “surprise”, which the organism wants to minimize. It has a more precise mathematical formulation, but the conceptual issues I want to discuss here do not depend on it.

Thus, it can be seen as an extension of the Bayesian brain hypothesis that accounts for action in addition to perception. It shares the conceptual problems of the Bayesian brain hypothesis, namely that it focuses on statistical uncertainty, inferring variables of a model (called “causes”) when the challenge is to build and manipulate the structure of the model. It also shares issues with the predictive coding concept, namely that there is a conflation between a technical sense of “prediction” (expectation of the future signal) and a broader sense that is more ecologically relevant (if I do X, then Y will happen). In my view, these are the main issues with the free energy principle. Here I will focus on an additional issue that is specific of the free energy principle.

The specific interest of the free energy principle lies in its formulation of action. It resonates with a very important psychological theory called cognitive dissonance theory. That theory says that you try to avoid dissonance between facts and your system of beliefs, by either changing the beliefs in a small way or avoiding the facts. When there is a dissonant fact, you generally don’t throw your entire system of beliefs: rather, you alter the interpretation of the fact (think of political discourse or in fact, scientific discourse). Another strategy is to avoid the dissonant facts: for example, to read newspapers that tend to have the same opinions as yours. So there is some support in psychology for the idea that you act so as to minimize surprise.

Thus, the free energy principle acknowledges the circularity of action and perception. However, it is quite difficult to make it account for a large part of behavior. A large part of behavior is directed towards goals; for example, to get food and sex. The theory anticipates this criticism and proposes that goals are ingrained in priors. For example, you expect to have food. So, for your state to match your expectations, you need to seek food. This is the theory’s solution to the so-called “dark room problem” (Friston et al., 2012): if you want to minimize surprise, why not shut off stimulation altogether and go to the closest dark room? Solution: you are not expecting a dark room, so you are not going there in the first place.

Let us consider a concrete example to show that this solution does not work. There are two kinds of stimuli: food, and no food. I have two possible actions: to seek food, or to sit and do nothing. If I do nothing, then with 100% probability, I will see no food. If I seek food, then with, say, 20% probability, I will see food.

Let’s say this is the world in which I live. What does the free energy principle tell us? To minimize surprise, it seems clear that I should sit: I am certain to not see food. No surprise at all. The proposed solution is that you have a prior expectation to see food. So to minimize the surprise, you should put yourself into a situation where you might see food, ie to seek food. This seems to work. However, if there is any learning at all, then you will quickly observe that the probability of seeing food is actually 20%, and your expectations should be adjusted accordingly. Also, I will also observe that between two food expeditions, the probability to see food is 0%. Once this has been observed, surprise is minimal when I do not seek food. So, I die of hunger. It follows that the free energy principle does not survive Darwinian competition.

Thus, either there is no learning at all and the free energy principle is just a way of calling predefined actions “priors”; or there is learning, but then it doesn’t account for goal-directed behavior.

The idea to act so as to minimize surprise resonates with some aspects of psychology, like cognitive dissonance theory, but that does not constitute a complete theory of mind, except possibly of the depressed mind. See for example the experience of flow (as in surfing): you seek a situation that is controllable but sufficiently challenging that it engages your entire attention; in other words, you voluntarily expose yourself to a (moderate amount of) surprise; in any case certainly not a minimum amount of surprise.

Draft of chapter 6, Spike initiation with an initial segment

I have just uploaded an incomplete draft of chapter 6, "Spike initiation with an initial segment". This chapter deals with how spikes are initiated in most vertebrate neurons (and also some invertebrate neurons), where there is a hotspot of excitability close to a large soma. This situation has a number of interesting implications which make spike initiation quite different from the situation investigated by Hodgkin and Huxley, that of stimulating the middle of an axon. Most of the chapter describes the theory that I have developed to analyze this situation, called "resistive coupling theory" because the axonal hotspot is resistively coupled to the soma.

The chapter is currently unfinished, because a few points require a little more research, which we have not finished. The presentation is also a bit more technical than I would like, so this is really a draft. I wanted nonetheless to release it now, as I have not uploaded a chapter for a while and it could be some time before the chapter is finished.

What is computational neuroscience? (XXVIII)The Bayesian brain

Our sensors give us an incomplete, noisy, and indirect information about the world. For example, estimating the location of a sound source is difficult because in natural contexts, the sound of interest is corrupted by other sound sources, reflections, etc. Thus it is not possible to know the position of the source with certainty. The ‘Bayesian coding hypothesis’ (Knill & Pouget, 2014) postulates that the brain represents not the most likely position, but the entire probability distribution of the position. It then uses those distributions to do Bayesian inference, for example, when combining different sources of information (say, auditory and visual). This would allow the brain to optimally infer the most likely position. There is indeed some evidence for optimal inference in psychophysical experiments – although there is also some contradicting evidence (Rahnev & Denison, 2018).

The idea has some appeal. The problem is that, by framing perception as a statistical inference problem, it focuses on the most trivial type of uncertainty, statistical uncertainty. It is illustrated by the following quote: “The fundamental concept behind the Bayesian approach to perceptual computations is that the information provided by a set of sensory data about the world is represented by a conditional probability density function over the set of unknown variables”. Implicit in this representation is a particular model, for which variables are defined. Typically, one model describes a particular experimental situation. For example, the model would describe the distribution of auditory cues associated with the position of the sound source. Another situation would be described by a different model, for example one with two sound sources would require a model with two variables. Or if the listening environment is a room and the size of that room might vary, then we would need a model with the dimensions of the room as variables. In any of these cases where we have identified and fixed parametric sources of variation, then the Bayesian approach works fine, because we are indeed facing a problem of statistical inference. But that framework doesn’t fit any real life situation. In real life, perceptual scenes have variable structure, which corresponds to the model in statistical inference (there is one source, or two sources, we are in a room, the second source comes from the window, etc). The perceptual problem is therefore not just to infer the parameters of the model (dimensions of the room etc), but also the model itself, its structure. Thus, it is not possible in general to represent an auditory scene by a probability distribution on a set of parameters, because the very notion of a parameter already assumes that the structure of the scene is known and fixed.

Inferring parameters for a known statistical model is relatively easy. What is really difficult, and is still challenging for machine learning algorithms today, is to identify the structure of a perceptual scene, what constitutes an object (object formation), how objects are related to each other (scene analysis). These fundamental perceptual processes do not exist in the Bayesian brain. This touches on two very different types of uncertainty: statistical uncertainty, variations that can be interpreted and expected in the framework of a model; and epistemic uncertainty,  the model is unknown (the difference has been famously explained by Donald Rumsfeld).

Thus, the “Bayesian brain” idea addresses an interesting problem (statistical inference), but it trivializes the problem of perception, by missing the fact that the real challenge is epistemic uncertainty (building a perceptual model), not statistical uncertainty (tuning the parameters): the world is not noisy, it is complex.

Is a thermostat conscious?

A theory of consciousness initially proposed by David Chalmers (in his book the Conscious Mind) is that consciousness (or experience) is a property of information processing systems. It is an additional property, not logically implied by physical laws; a new law of nature. The theory was later formalized by Giulio Tononi into Integrated Information Theory, based on Shannon’s mathematical concept of information. One important feature of this theory is it is a radical form of panpsychism: it assigns consciousness (to different degrees) to virtually anything in the world, including a thermostat.

The Bewitched experiment of thought

I have criticized IIT previously on the grounds that it fails to define in a sensible way what makes a conscious subject (eg a subsystem of a conscious entity would be another conscious entity, so for example your brain would produce an infinite number of minds). But here I want to comment specifically on the example of the thermostat. It is an interesting example brought up by Chalmers in his book. The reasoning is as follows: a human brain is conscious; a mouse brain is probably conscious, but with a somewhat lower degree (for example, no self-consciousness). As we go down the scale of information-processing systems, the system might be less and less conscious, but why would it be that there is a definite threshold for consciousness? Why would a billion neurons be conscious but not a million? Why would a million neurons be conscious but not one thousand? And how about just one neuron? How about a thermostat? A thermostat is an elementary information-processing system with just two states, so maybe, Chalmers argue, the thermostat has a very elementary form of experience.

To claim that a thermostat is conscious defies intuition, but I would not follow Searle on insisting that the theory must be wrong because it assigns consciousness to things that we wouldn’t intuitively think are conscious. As I argued in a previous post, to claim that biology tells us that only brains are conscious is to use circular arguments. We don’t know whether anything else than a brain is conscious, and since consciousness is subjective, to decide whether anything is conscious is going to involve some theoretical aspects. Nonetheless, I am skeptical that a thermostat is conscious.

I propose to examine the Bewitched experiment of thought. In the TV series Bewitched, Samantha the housewife twitches her nose and everyone freezes except her. Then she twitches her nose and everyone unfreezes, without noticing that anything happened. For them, time has effectively stopped. The question is: was anyone experiencing anything during that time? To me, it is clear that no one can experience anything if time is frozen. In fact, that whole time has not existed at all for the conscious subject. It follows that a substrate with a fixed state (e.g. hot/cold) cannot experience anything, because time is effectively frozen for that substrate. Experience requires a flow of time, a change in structure through time. I leave it open whether the interaction of the thermostat with the room might produce experience for that coupled system (see below for some further thoughts).

What is “information”?

In my view, the fallacy in the initial reasoning is to put the thermostat and the brain in the same scale. That scale is the set of information-processing systems. But as I have argued before (mostly following Gibson’s arguments), it is misleading to see the brain an information-processing system. The brain can only be seen to transform information of one kind into information of another kind by an external observer, because the very concept of information is something that makes sense to a cognitive/perceptual system. The notion of information used by IIT is Shannon information, a notion from communication theory. This is an extrinsic notion of information: for example, neural activity is informative about objects in the world in the sense that properties of those objects can be inferred from neural activity. But this is totally unhelpful to understand how the brain, which only ever gets to deal with neural signals and not things in the world, sees the world (see this argument in more detail in my paper Is coding a relevant metaphor for the brain?).

Let’s clarify with a concrete case: does the thermostat perceive temperature? The thermostat can be in different states depending on temperature, but from its perspective, there is no temperature. There are changes in state that seems to be unrelated to anything else (there is literally nothing else for the thermostat). One could replace the temperature sensor with some other sensor, or with a random number generator, and there would be literally no functional change in the thermostat itself. Only an external observer can link the thermostat’s state with temperature, so the thermostat cannot possibly be conscious of temperature.

Thus, Shannon’s notion of information is inappropriate to understand consciousness. Instead of extracting information in the sense of communication theory, what the brain might do is build models of sensory (sensorimotor) signals from its subjective perspective, in the same way as scientists make models of the world with observations (=sensory signals) and experiments (=actions). But this intrinsic notion of information, which corresponds eg to laws of physics, is crucially not what Shannon’s notion of information is. And it is also not the kind of information that a thermostat is dealing with.

This inappropriate notion of information leads to what in my view is a rather absurd quantitative scale of consciousness, according to which entities are more or less conscious along a graded scale (phi). Differences in consciousness are qualitative, not quantitative: there is dreaming, being awake, being self-conscious or not, etc. These are not different numbers. This odd analog scale arises because Shannon information is counted in bits. But information in the sense of knowledge (science) is not counted in bits; there are different kinds of knowledge, they have different structure, relations between them etc.

Subjective physics of a thermostat

But let us not throw away Chalmers’ interesting experiment of thought just now. Let us ask, following Chalmers: what does it feel like to be a thermostat? We will examine it not with Shannon’s unhelpful notion of information but with what I called “subjective physics”: the laws that govern sensory signals and their relations to actions, from the perspective of the subject. This will define my world from a functional viewpoint. Let’s say I am a conscious thermostat; a homunculus inside the thermostat. All I can observe is a binary signal. Then there is a binary action that I can make, which for an external observer corresponds to turning on the heat. What kind of world does that make to me? Let’s say I’m a scientist homunculus, what kind of laws about the world can I infer?

If I’m a conventional thermostat, then the action will be automatically triggered when the signal is in a given state (“cold”). After some time, the binary signal will switch and so will the action. So in fact there is an identity between signal and action, which means that all I really observe is just the one binary signal, switching on and off, probably with some kind of periodicity. This is the world I might experience, as a homunculus inside the thermostat (note that to experience the periodicity requires memory, which a normal thermostat doesn’t have). In a way, I’m a “locked-in” thermostat: I can make observations, but I cannot freely act.

Let’s say that I am not locked-in and have a little more free will, so I can decide whether to act (heat) or not. If I can, then my world is a little bit more interesting: my action can trigger a switch of the binary signal, after some latency (again requiring some memory), and then when I stop, the binary signal switches back, after a time that depends on how much time my previous action lasted. So here I have a world that is much more structured, with relatively complex laws which in a way defines the concept of “temperature” from the perspective of the thermostat.

So if a thermostat were conscious, then we have a rough idea of the kind of world it might experience (although not how it feels like), and even in this elementary example, you can’t measure these experiences in bits - let alone the fact that a thermostat is not conscious anyway.

What does Gödel's theorem mean ?

Gödel's theorem is a result in mathematical logic, which is often stated as showing that « there are true things that cannot proved ». It is sometimes used to comment on the limits of science, or the superiority of human intuition. Here I want to clarify what this theorem means and what the epistemological implications are.

First, this phrasing is rather misleading. It makes the result sound almost mystical. If you phrase the result differently, by avoiding the potentially confusing reference to truth, the result is not that mystical anymore. Here is how I would phrase it : you can always add an independent axiom to a finite system of axioms. This is not an obvious mathematical result, but I wouldn't think it defies intuition.

Why is this equivalent to the first phrasing ? If the additional axiom is independent of the set of axioms, then it cannot be proved from them (by definition). Yet as a logical proposition it has to be either true or not true. So it is true, or its negation is true, but it cannot be proved. What is misleading in the first phrasing is that the statement « there are true things » is contextual. I can start from a set of axioms and add one, and that new one will be true (since it's an axiom). Instead I could add its negation, and then that one will be true. That the proposition is true is not a universal truth, as it would seem with the phrasing « there are true things ». It is true in a particular mathematical world, and you can consider another one where it is not true. Famous examples are Euclidean and non-Euclidean geometries, which are mutually inconsistent sets of axioms.

So, what Gödel's theorem says is simply that no finite system of axioms is complete, in the sense that you can always add one without making the system inconsistent.

What are the epistemological implications ? It does not mean that there are things that science cannot prove. Laws of physics are not proved by deduction anyway. They are hypothesized and empirically tested, and all laws are provisory. Nevertheless, it does raise some deep philosophical questions, which have to do with reductionism. I am generally critical of reductionism, but more specifically of methodological reductionism, the idea that a system can be understood by understanding the elements that compose it. For example : understand neurons and you will understand the brain. I think this view is wrong, because it is the relations between neurons, at the scale of the organism, which make a brain. The right approach is systemic rather than reductionist. Many scientists frown at criticisms of reductionism, but this is only because they confuse methodological and ontological reductionism. Ontological reductionism means that reality can be reduced to a small number of types of things (eg atoms) and laws, and everything can be understood in these terms. For example, the mind can in principle be understood in terms of interactions of atoms that constitute the brain. Most scientists seem to believe in ontological reductionism.

Let us go back now to Gödel's theorem. An interesting remark made by theoretical biologist Robert Rosen is that Gödel's theorem makes ontological reductionism implausible to him. Why ? The theorem says that, whatever system of axioms you choose, it will always be possible to add one which is independent. Let us say we have agreed on a small set of fundamental physical laws, with strong empirical support. To establish each law, we postulate it and test it empirically. At a macroscopic level, scientists postulate and test all sorts of laws. How can we claim that any macroscopic law necessarily derives from the small set of fundamental laws ? Gödel's theorem says that there are laws that you can express but that are independent of the fundamental laws. This means that there are laws that can only be established empirically, not formally, in fact just like the set of fundamental laws. Of course it could be the case that most of what matters to us is captured by a small of set of laws. But maybe not.

A brief critique of predictive coding

Predictive coding is becoming a popular theory in neuroscience (see for example Clark 2013). In a nutshell, the general idea is that brains encode predictions of their sensory inputs. This is an appealing idea because superficially, it makes a lot of sense: functionally, the only reason why you would want to process sensory information is if it might impact your future, so it makes sense to try to predict your sensory inputs.

There are substantial problems in the details of predictive coding theories, for example with the arbitrariness of the metric by which you judge that your prediction matches sensory inputs (what is important?), or the fact that predictive coding schemes encode both noise and signal. But I want to focus on the more fundamental problems. One has to with “coding”, the other with “predictive”.

It makes sense that brains anticipate. But does it make sense that brains code? Coding is a metaphor of a communication channel, and this is generally not a great metaphor for what the brain might do, unless you fully embrace dualism. I discuss this at length in a recent paper (Is coding a relevant metaphor for the brain?) so I won’t repeat the entire argument here. Predictive coding is a branch of efficient coding, so the same fallacy underlies its logic: 1) neurons encode sensory inputs; 2) living organisms are efficient; => brains must encode efficiently. (1) is trivially true in the sense that one can define a mapping from sensory inputs to neural activity. (2) is probably true to some extent (evolutionary arguments). So the conclusion follows. Critiques of efficient coding have focused on the “efficient” part: maybe the brain is not that efficient after all. But the error is elsewhere: living organisms are certainly efficient, but it doesn’t follow that they are efficient at coding. They might be efficient at surviving and reproducing, and it is not obvious that it entails coding efficiency (see the last part of the abovementioned paper for a counter-example). So the real strong assumption is there: the main function of the brain is to represent sensory inputs.

The second problem has to with “predictive”. It makes sense that an important function of brains, or in fact of any living organism, is to anticipate (see the great Anticipatory Systems by Robert Rosen). But to what extent do predictive coding schemes actually anticipate? First, in practice, those are generally not prediction schemes but compression schemes, in the sense that they do not tell us what will happen next but what happens now. This is at least the case of the classical Rao & Ballard (1999). Neurons encode the difference between expected input and actual input: this is compression, not prediction. It uses a sort of prediction in order to compress: other neurons (in higher layers) produce predictions of the inputs to those neurons, but the term prediction is used in the sense that the inputs are not known to the higher layer neurons, not that the “prediction” occurs before the inputs. Thus the term “predictive” is misleading because it is not used in a temporal sense.

However, it is relatively easy to imagine how predictive coding might be about temporal predictions, although the neural implementation is not straightforward (delays etc). So I want to make a deeper criticism. I started by claiming that it is useful to predict sensory inputs. I am taking this back (I can because I said it was superficial reasoning). It is not useful to know what will happen. What is useful is to know what might happen, depending on what you do. If there is nothing you can do about the future, what is the functional use of predicting it? So what is useful is to predict the future conditionally to a different set of potential actions. This is about manipulating models of the world, not representing the present.