Archives de catégorie : Blog

The machine learning analogy of perception

To cast the problem of neural computation in sensory systems, one often refers to the standard framework in machine learning. A typical example is as follows: there is a dataset, which could be for example a set of images, and the goal is to learn a mapping between these images and categories, for example faces or cars. In the learning phase, labels are externally given to these images, and the machine learning algorithm builds a mapping between images and labels. As an analogy of what sensory systems do, the question is then: how do neurons learn this mapping, e.g. to fire when they are presented with an image of a given category? This question is the starting point of many theories in computational neuroscience. It is essentially an inference problem: to each category corresponds a distribution of images, and so what sensory systems must do is learn this distribution and compute what the most likely category is for a given presented image. This is why Bayesian approaches are appealing from this point of view, because an efficient sensory system should then be an ideal Bayesian observer. It just follows from the way the problem of perception is cast.

But is this actually a good analogy? In fact, it differs from the problems sensory systems actually face in at least three important ways:

1) elements of the data set are considered independent;

2) these elements are externally given;

3) the labels are externally defined.

First of all, elements of the data set are never independent in a real perceptual system. On the contrary, there is a continuous flow of sensory input. Vision is not a slideshow. The visual field changes through time in a continuous way, and more importantly the changes are lawful because objects are embedded in the physical world. We can perceive these laws, for example the rigidity of movements, and this is something that cannot be found in the “slideshow” view of vision that is implied by the machine learning analogy. I believe this is the main message of James Gibson. Moreover, there are lawful relationships in the sensory inputs, but there are also sensorimotor relationships. This is information that can be picked up from the sensory or sensorimotor flow, not by inference from the distribution of slides in the slideshow. This means that perception is not (or not only) inferential but relational: sensory inputs are analyzed in reference to themselves (their internal structure), and not (only) to memory.

A second point is that in the machine learning analogy, elements of the dataset are considered given, and the algorithm reacts to it. In psychology, this view corresponds to behaviorism, in which the organism is only considered from a stimulus-reaction point of view. But in fact a more ecologically accurate view is that data are in general produced by the actions of the organism, rather than passively received. Gibson criticized the information processing viewpoint for this reason, because the world does not produce messages to be decoded by a receiver, on the contrary a perceptual system samples its environment. It is really the opposite view: the organism does not react to a stimulus, but rather the environment reacts to the actions of the organism, and it is this reaction that is analyzed by the organism. In the machine learning field, there are new frameworks that try to address this aspect, named “active learning”: the algorithm chooses a data element and asks for its label, for example to maximize the information that can be gained.

Finally, in the machine learning analogy, the label is externally defined. But in a closed system, this is not possible. The organism must define the relevant categories by itself. But how can these categories be a priori defined? Often, this problem is discarded by what I would call “evolutionary magic”: these categories are provided by “evolution” because they are important to the survival and reproduction of the animal. I call it “magic” because the teleological argument does not provide any explanation at all: it is about as metaphysical as if “evolution” were replaced by “God”, in the sense that it has the same explanatory power. Bringing intergenerational changes of the organism does not solve the problem: whatever mechanism is involved, pressure for change still has to come from the environment and the way the organism can interact with it, not from an external source.

In fact, this problem was addressed by the development of phenomenology in philosophy, introduced by Husserl about a century ago. Followers of the phenomenological approach include Merleau-Ponty and Sartre. The idea is the following. What “really” exists in the world is a metaphysical question: it actually does not matter for the organism if it makes no difference to its experience. For example, is there such a thing as “absolute space”, the existence of an absolute location of things? The question is metaphysical because only relative changes in space can be experienced (the relative location of things) – this point was noted by Henri Poincaré. In the phenomenological approach, “essence” is what remains invariant under changes of perspective. I believe this is related to a central point in Gibson’s theory: information is given in the “structural invariants” present in the sensory inputs. These invariants do not need an external reference to be noticed.

For example, consider a sound source that produces two acoustical waves at the two ears. Neglecting sound diffraction, these acoustical waves are identical apart from a propagation delay (the interaural time difference or ITD). When a sound is produced by the source, this property is invariant through time – it is a law that is always satisfied. But what makes it a spatial property? It is spatial because the property is broken when movements are produced by the organism (e.g. head movements). In addition, there is a higher-order property, which is the relationship between the interaural delay and the movements of the head, which is always true, as long as the source does not move. This structural invariant is then information about the location of the sound source, in fact the relationship can be mapped to the physical location of the source. But the “label” here is intrinsically defined: it is precisely the relationship between head position and ITD. Thus labels can be intrinsically defined, as the sensory and sensorimotor structure. This is the postulate of the sensorimotor account of perception, according to which perception is precisely the anticipated effect of the organism’s action on the sensory inputs.

The fact that these labels can be intrinsically defined is, I believe, what James Gibson means when he states that information is “picked-up” and that perception is “direct”. But I would like to go further: there is no doubt that there can be inference in perception, and so in that sense perception cannot be entirely direct. For example, one can visually recognize an object that is partially occluded, and imagine the rest of the object (“amodal perception”). But the point is that what is inferred, i.e., the “label” in the machine learning terminology, is not an externally given category, but the sensory or sensorimotor structure, part of which is hidden. The main difference is that there is no need for an external reference. For example, in the sound localization example, a brief sound may be presented at a given direction. Then the sensorimotor structure that defines source direction for the organism is hidden, since there is no sound when the organism can turn its head. So this structure is inferred from the ITD. In other words, what is inferred is not an angle, which would make no sense for an animal that has no measurement tool, it is the effect of its own movements on the perceived ITD. So there is inference, but inference is not the basis of perception. It cannot be, for how would you know what should be inferred? For this reason, Gibson rejects inference by the argument that it would yield to infinite regress. As I have tried to explain, it is not inference per se that is problematic, but the idea that it might be the basis of perception.

This is quite important for our view of neural computation: this means that Bayesian inference is not so central anymore in the function of sensory systems. Certainly, inference is useful and perhaps necessary in many cases. But perhaps more important is the discovery of sensory and sensorimotor structure, that is, the elaboration of what is to be inferred. This requires the development of a theory of neural computation that is primarily relational rather than inferential.

In summary, labels can be intrinsically defined by the invariant structure of sensory and sensorimotor signals. I would like to end this post with another important Gibsonian notion: “affordances”. Gibson thought that we perceive “affordances”, which are what the objects of perception allow in terms of interaction. For example, a door affords opening, a wall affords blocking, etc. This is an important notion, because it defines meaning in terms of things that make sense to the organism, rather than in externally defined terms.

To conclude, a theory of neural computation that takes into account these points should differ from standard theories in the following way: it should be

1) relational (discovering internal structure) rather than inferential (comparing with memory),

2) active (inputs are not questions but answers) rather than passive (inputs are questions, actions are answers), and

3) subjective (meaning is defined by the interaction with the environment) rather than objective (objects are externally defined).

Rate vs. timing (XI) Flavors of spike-based theories (1) Synfire chains

I will now give a brief overview of different spike-based theories. This is not meant to be an exhaustive review – although I would be very glad to have some feedback on what other theories would deserve to be mentioned. My aim is rather to highlight a common theme in these theories, which distinguishes them from rate-based theories. At the same time, I also want to emphasize the diversity of these theories.

Synfire chains” were introduced by Moshe Abeles in the 1980s (see his 1991 book: Corticonics: Neural Circuits of the Cerebral Cortex), although in fact the concept can be traced back to 1963 (Griffith, Biophysical Journal 1963). It is based on the observation that neurons are extremely sensitive to coincidences in their inputs (in the fluctuation-driven regime) - I commented on this fact in a previous post. So if a small number of input spikes are received synchronously in one neuron (on the order of ten), then the neuron spikes. Now if these presynaptic neurons also have postsynaptic neurons in common, then these postsynaptic neurons will also fire synchronously. By this mechanism, synchronous activity propagates from one group of neurons to another group. This is a feedforward mode of synchronous propagation along a chain of layers (hence the terms “synfire chains”), but note that this feedforward mode of propagation may be anatomically embedded in a recurrent network of neurons. It is important to note that this propagation mode 1) can only be stable in the fluctuation-driven regime (as opposed to the mean-driven regime), 2) is never stable for perfect integrators (without the leak current). A simple explanation of the phenomenon was presented by Diesmann et al. (Nature 1999), in terms of dynamical systems theory (synchronous propagation is a stable fixed point of the dynamics). In that paper, neurons are modeled with background noise – it is not a deterministic model. Much earlier, in 1963, Griffith presented the deterministic theory of synfire chains (without the name) in discrete time and with binary neurons (although he did consider continuous time at the end of the paper). He also considered the inclusion of inhibitory neurons in the chain, and showed that it could yield stable propagation modes with only a fraction of neurons active in each layer. Izhikevich extended the theory of synfire chains to synchronous propagation with heterogeneous delays, which he termed “polychronization” (I will discuss it later).

As I have described them so far, synfire chains are postulated to result from the dynamics of neural networks, given what we know of the physiology of neurons. This raises two questions: 1) do they actually exist in the brain? and 2) are they important? I do not think there is a definite empirical answer to the first question (especially considering possible variations of the theory), but it is interesting to consider its rejection. If one concludes, on the basis of empirical evidence, that synfire chains do not exist, then another question pops up: why are they not observed, given that they seem to be implied by what we know of neural physiology? Possibly then, there could be specific mechanisms to avoid the propagation of synchronous activity (e.g. cancellation of expected correlations with inhibition, which I mentioned in a previous post). I am not taking side here, but I would simply like to point out that, because the existence of synfire chains is deduced from generally accepted knowledge, the burden of proof should in fact be shared between the tenants of synfire chains and their opponents: either synfire chains exist, or there is a mechanism to prevent them from existing (or at least their non-existence deserves an explanation). Before someone objects that models of recurrent spiking neural networks do not generally display synfire activity, I would like to point out that these are generally sparse random networks (i.e., with no short cycle in the connectivity graph) in which the existence of the irregular state is artificially maintained by external noisy input (e.g. Brunel (2000)), or by a suprathreshold intrinsic current (e.g. Vogels and Abbott 2005).

The second question is functional: what might be the computational advantage of synfire chains? As I understand it, they were not introduced for computational reasons. However, a computational motivation was later proposed in reference to the binding problem, or more generally compositionality by Bienenstock (1996, in Brain Theory). A similar proposition was also made by Christof von der Marlsburg and Wolf Singer, although in the context of oscillatory synchrony, not synfire chains. A scene is composed of objects that have relationships to each other. Processing such a scene requires identifying the properties of objects, but also identifying the relationships between these objects, e.g. that different features belong to the same object. In a rate-based framework, the state of the network at a given time is given a vector of rates (one scalar value per neuron). If the activation of each neuron or set of neurons represents a given property, then the network can only represent an unstructured set of properties. The temporal organization of neural activity may then provide the required structure. Neurons firing in synchrony (at a given timescale) may then represent properties of the same object. The proposition makes sense physiologically because presynaptic neurons can only influence a postsynaptic target if they fire together within the temporal integration window of the postsynaptic neuron (where synchrony is seen from the postsynaptic viewpoint, i.e., after axonal propagation delays). In my recent paper on computing with neural synchrony (which is not based on synfire chains but on stimulus-specific synchrony), I show on an olfactory example how properties of the same object can indeed be bound together by synchrony. I also note that it also provides a way to filter out irrelevant signals (noise), because these are not temporally coherent.

Apart from compositionality, the type of information processing performed by synfire chains is very similar to those of feedforward networks in classical neural network theory. That is, the activation (probability of firing) of a given unit is essentially a sigmoidal function of a weighted sum of activations in the preceding layer (if heterogeneous synaptic weights are included) – neglecting the temporal dispersion of spikes. But there may be another potential computational interest of synfire chains, compared to traditional rate-based feedforward models, which is processing speed. Indeed, in synfire chains, the propagation speed is limited by axonal conduction delays, not by neural integration time.

In the next post, I will comment on polychronization, an extension of synfire chains that includes heterogeneous delays.

Reader's digest (12 Dec 2012)

I am starting a new series of posts on this blog, called « Reader’s digest ». These are simply bibliographical notes on recent readings (of recent or old papers).

While reading Abeles’s Scholarpedia entry on synfire chains, I learned that while Abeles introduced the terms “synfire chains”, the concept is in fact older. It seems that it dates back to Griffith in 1963 (Griffith 1963). In that paper, he considers threshold binary neurons, in discrete time. Previous authors showed that networks of such neurons could only be in two stationary states: quiescent or fully active (Beurle 1956; Ashby et al. 1962), and this was seen as a paradox, since apparently this is not what happens in the nervous system. This is reminiscent of a problem that is still present in current literature: the stability of the persistent irregular state in closed networks of spiking neurons (indeed most models use either external noise to maintain activity (Brunel 2000) or a suprathreshold intrinsic current, as in (Vogels & Abbott 2005)).

In his paper, Griffith introduces a “transmission line”, which is exactly a synfire chain, except with discrete rather than continuous time (an approximation that he acknowledges and even reconsiders at the end of the paper). He demonstrates (with calculations involving binomial distributions and the central limit theorem) that indeed there is only a single stable mode of propagation (all neurons active) but that if inhibitory neurons are also included, then there may be another stable mode, with only a fraction of neurons being active. He also mentions the possibility of unstable oscillations due to inhibition (which corresponds to what is now called the “PING” mechanism, pyramidal-interneuron gamma).

The paper is interesting for two reasons: 1) it includes methods of calculations relevant for synfire chain propagation, 2) it seems to provide a solution for the problem of the stability of irregular activity, which is based on a small number of strong inhibitory neurons. This latter point applies both to synfire chains and to more traditional recurrent networks.

 

References

Ashby, W.R., Von FOERSTER, H. & Walker, C.C., 1962. Instability of Pulse Activity in a Net with Threshold. Nature, 196(4854), p.561‑562.

Beurle, R.L., 1956. Properties of a Mass of Cells Capable of Regenerating Pulses. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 240(669), p.55‑94.

Brunel, N., 2000. Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. J Comput Neurosci, 8(3), p.183.

Griffith, J.S., 1963. On the Stability of Brain-Like Structures. Biophysical Journal, 3(4), p.299‑308.

Vogels, T.P. & Abbott, L.F., 2005. Signal Propagation and Logic Gating in Networks of Integrate-and-Fire Neurons. The Journal of Neuroscience, 25(46), p.10786‑10795.

What is computational neuroscience? (VII) Incommensurability and relativism

I explained in previous posts that new theories should not be judged by their agreement with the current body of empirical data, because these data were produced by the old theory. In the new theory, they may be interpreted very differently or even considered irrelevant. A few philosophers have gone so far as to state that different theories are incommensurable, that is, they cannot be compared with each other because they have different logics (e.g. observations are not described in the same way in the different theories). This reasoning may lead to relativistic views of science, that is, the idea that all theories are equally “good” and that their choice are a matter of personal taste or fashion. In this post I will try to explain the arguments, and also to discard relativism.

In “Against Method”, Feyerabend explains that scientific theories are defined in a relational way, that is, elements of a theory make sense only in reference to other elements of the theory. I believe this is a very deep remark that applies to theories of knowledge in the broadest sense, including perception for example. Below, I drew a schematic figure to illustrate the arguments.

Theories are systems of thought that relate to the world. Concepts in a theory are meant to relate to the world, and they are defined with respect to other concepts in the theory. A given concept in a given theory may have a similar concept in another theory, but it is a different concept, in general. To explain his arguments, Feyerabend uses the analogy of language. It is a good analogy because languages relate to the world, and they have an internal relational structure. Imagine theories A and B are two languages. A word in language A is defined (e.g. in the dictionary) by using other words from language A. A child learns her native language by picking up the relationship between the words, and how they relate to the world she can see. To understand language A, a native speaker of language B may translate the words. However, translation is not definition. It is imprecise because the two words often do not have exactly the same meaning in both languages. Some words may not even exist in one language. A deeper understanding of language A requires to go beyond translation, and to capture the meaning of words by acquiring a more global understanding of the language, both in its internal structure and in its relationship with the world.

Another analogy one could make is political theories, in how they view society. Clearly, a given observation can be interpreted in opposite ways in conservative and liberal political views. For example, the same economic crisis could be seen as the result of public debt or as the result of public cuts in spending (due to public acquisition of private debt).

These analogies support the argument that an element of a new theory may not be satisfactorily explained in the framework of the old theory. It may only make full sense when embedded in the full structure of the new theory – which means that new theories may be initially unclear and that the concepts may not be well defined. This remark can certainly make different theories difficult to compare, but I would not conclude that theories are incommensurable. This conclusion would be valid if theories were closed systems, because then a given statement would make no sense elsewhere than in the context of the theory in which it is formulated. Axiomatic systems in mathematics could be said to be incommensurable (for example, Euclidian and non-Euclidian geometries). But theories of knowledge, unlike axiomatic systems, are systems that relate to the world, and the world is shared between different theories (as illustrated in the drawing above). For this reason, translation is imprecise but not arbitrary, and one may still assess the degree of consistency between a scientific theory and the part of the world it is meant to explain.

One may find an interesting example in social psychology. In the theory of cognitive dissonance, new facts that seem to contradict our belief system are taken into account by minimally adjusting that belief system (minimizing the “dissonance” between the facts and the theory). In philosophy of knowledge, these adjustments would be called “ad hoc hypotheses”. When it becomes too difficult to account for all the contradictory facts (making the theory too cumbersome), the belief system may ultimately collapse. This is very similar to the theory of knowledge defended by Imre Lakatos, where belief systems are replaced by research programs. Cognitive dissonance theory was introduced by a field study in a small American sect who believed that the end of the world would occur at a specific date (Festinger, Riecken and Schachter (1956), When Prophecy Fails. University of Minnesota Press). When the said date arrived and the world did not end, strangely enough, the sect did not collapse. On the contrary, it made it stronger, with the followers more firmly believing in their view of the world. They considered that the world did not end because they prayed so much and God heard their prayers and postponed the event. So they made a new prediction, which of course turned out to be false. The sect ultimately collapsed, although only after a surprisingly long time.

The example illustrates two points. Firstly, a theory does not collapse because one prediction is falsified. Instead, the theory is adjusted with a minor modification so as to account for the seemingly contradicting observation. But this process does not go on forever, because of its interaction with the world: when predictions are systematically falsified, the theory ultimately loses its followers, and for a good reason.

In summary, a theory of knowledge is a system in interaction with the world. It has an internal structure, and it also relates to the world. And although it may relate to the world in its own words, one may still assess the adequacy of this relationship. For this reason, one may not defend scientific relativism in its strongest version.

For the reader of my other posts in this blog, this definition of theories of knowledge might sound familiar. Indeed it is highly related to theories of perception defended by Gibson, O’Regan and Varela, for example. After all, perception is a form of knowledge about the world. These authors have in common that they define perception in a relational way, the relationship between the actions of the organism in the world (driven by “theory”) and the effects of these actions on the organism (“tests” of the theory). This is in contrast with “neurophysiological subjectivism”, for which meaning is intrinsically produced by the brain (a closed system, in my drawing above) and “computational objectivism”, in which there is a pre-defined objective world (related to the idea of translation).

What is computational neuroscience? (VI) Deduction, induction, counter-induction

At this point, it should be clear that there is not a single type of theoretical work. I believe most theoretical work can be categorized into three broad classes: deduction, induction, and counter-induction. Deduction is deriving theoretical knowledge from previous theoretical knowledge, with no direct reference to empirical facts. Induction is the process of making a theory that accounts for the available empirical data, in general in a parsimonious way (Occam’s razor). Counter-induction is the process of making a theory based on non-empirical considerations (for example philosophical principles or analogy) or on a subset of empirical observations that are considered significant, and re-interpreting empirical facts so that they agree with the new theory. Note that 1) all these processes may lead to new empirical predictions, 2) a given line of research may use all three types of processes.

For illustration, I will discuss the work done in my group on the dynamics of spike threshold (see these two papers with Jonathan Platkiewicz: “ A Threshold Equation for Action Potential Initiation” and “Impact of Fast Sodium Channel Inactivation on Spike Threshold Dynamics and Synaptic Integration”). It is certainly not the most well-known line of research and therefore it will require some explanation. However, since I know it so well, it will be easier to highlight the different types of theoretical thinking – I will try to show how all three types of processes were used.

I will first briefly summarize the scientific context. Neurons communicate with each other by spikes, which are triggered when the membrane potential reaches a threshold value. It turns out that, in vivo, the spike threshold is not a fixed value even within a given neuron. Many empirical observations show that it depends on the stimulation, and on various aspects of the previous activity of the neuron, e.g. its previous membrane potential and the previously triggered spikes. For example, the spike threshold tends to be higher when the membrane potential was previously higher. By induction, one may infer that the spike threshold adapts to the membrane potential. One may then derive a first-order differential equation describing the process, in which the threshold adapts to the membrane potential with some characteristic time constant.  Such phenomenological equations have been proposed in the past by a number of authors, and it is qualitatively consistent with a number of properties seen in the empirical data. But note that an inductive process can only produce a hypothesis. The data could be explained by other hypotheses. For example, the threshold could be modulated by an external process, say inhibition targeted at the spike initiation site, which would co-vary with the somatic membrane potential. However, the hypothesis could potentially be tested. For example, an experiment could be done in which the membrane potential is actively modified by an electrode injecting current: if threshold modulation is external, spike threshold should not be affected by this perturbation. So an inductive process can be a fruitful theoretical methodology.

In our work with Jonathan Platkiewicz, we started from this inductive insight, and then followed a deductive process. The biophysics of spike initiation is described by the Hodgkin-Huxley equations. Hodgkin and Huxley got the Nobel prize in 1963 for showing how ionic mechanisms interact to generate spikes in the squid giant axons. They used a quantitative model (four differential equations) that they fitted to their measurements. They were then able to accurately predict the velocity of spike propagation along the axon. As a side note, this mathematical model, which explicitly refers to ionic channels, was established much before these channels could be directly observed (by Neher and Sakmann, who then also got the Nobel prize in 1991). Thus this discovery was not data-driven at all, but rather hypothesis-driven.

In the Hodgkin-Huxley model, spikes are initiated by the opening of sodium channels, which let a positive current enter the cell when the membrane potential is high enough, triggering a positive feedback process. These channels also inactivate (more slowly) when the membrane potential increases, and when they inactivate the spike threshold increases. This is one mechanism by which the spike threshold can adapt to the membrane potential. Another way, in the Hodgkin-Huxley equations, is by the opening of potassium channels when the membrane potential increases. In this model, we then derived an equation describing how the spike threshold depends on these ionic channels, and then a differential equation describing how it evolves with the membrane potential. This is a purely deductive process (which also involves approximations), and it also predicts that the spike threshold adapts to the membrane potential. Yet it provides new theoretical knowledge, compared to the inductive process. First, it shows that threshold adaptation is consistent with Hodgkin-Huxley equations, an established biophysical theory. This is not so surprising, but given that other hypotheses could be formulated (see e.g. the axonal inhibition hypothesis I mentioned above), it strengthens this hypothesis. Secondly, it shows under what conditions on ionic channel properties the theory can be consistent with the empirical data. This provides new ways to test the theory (by measuring ionic channel properties) and therefore increases its empirical content. Thirdly, the equation we proposed is slightly different from those previously proposed by induction. That is, the theory predicts that the spike threshold only adapts above a certain potential, otherwise it is fixed. This is a prediction that is not obvious from the published data, and therefore could not have been made by a purely inductive process. Thus, a deductive process is also a fruitful theoretical methodology, even though it is in some sense “purely theoretical”, that is, accounting for empirical facts is not part of the theory-making process itself (except for motivating the work).

In the second paper, we also used a deductive process to understand what threshold adaptation implies for synaptic integration. For example, we show that incoming spikes interact at the timescale of threshold adaptation, rather than of the membrane time constant. Note how the goal of this theoretical work now is not to account for empirical facts or explain mechanisms, but to provide a new interpretative framework for these facts. The theory redefines what should be considered significant – in this case, the distance to threshold rather than the absolute membrane potential. This is an important remark, because it implies that theoretical work is not only about making new experimental predictions, but also about interpreting experimental observations and possibly orienting future experiments.

We then concluded the paper with a counter-inductive line of reasoning. Different ionic mechanisms may contribute to threshold adaptation, in particular sodium channel inactivation and potassium channel activation. We argued that the former was more likely, because it is more energetically efficient (the latter requires both sodium and potassium channels to be open and counteract each other, implying considerable ionic traffic). This argument is not empirical: it relies on the idea that neurons should be efficient based on evolutionary theory (a theoretical argument) and on the fact that the brain has been shown to be efficient in many other circumstances (an argument by analogy). It is not based on empirical evidence, and worse, it is contradicted by empirical evidence. Indeed, blocking Kv1 channels abolishes threshold dynamics. I then reason counter-inductively to make my theoretical statement compatible with this observation. I first note that removing the heart of a man prevents him from thinking, but it does not imply that thoughts are produced by the heart. This is an epistemological argument (discarding the methodology as inappropriate). Secondly, I was told by a colleague (unpublished observation) that suppressing Kv1 moves the spike initiation site to the node of Ranvier (discarding the data as being irrelevant or abnormal). Thirdly, I can quantitatively account for the results with our theory, by noting that suppressing any channel can globally shift the spike threshold and possibly move the minimum threshold below the half-inactivation voltage of sodium channels, in which case there is no more threshold variability. These are three counter-inductive arguments that are perfectly reasonable. One might not be convinced by them, but they cannot be discarded as being intrinsically wrong. Since it is possible that I am right, counter-inductive reasoning is a useful scientific methodology. Note also how counter-inductive reasoning can suggest new experiments, for example testing whether suppressing Kv1 moves the initiation site to the node of Ranvier.

In summary, there are different types of theoretical work. They differ not so much in content as in methodology: deduction, induction and counter-induction. All three types of methodologies are valid and fruitful, and they should be recognized as such, noting that they have different logics and possibly different aims.

 

Update. It occurred to me that I use the word “induction” to refer to the making of a law from a series of observations, but it seems that this process is often subdivided in two different processes, induction and abduction. In this sense, induction is the making of a law from a series of observations in the sense of “generalizing”: for example, reasoning by analogy or fitting a curve to empirical data. Abduction is the finding of a possible underlying cause that would explain the observations. Thus abduction is more creative and seems more uncertain: it is the making of a hypothesis (among other possible hypotheses), while induction is rather the direct generalization of empirical data together with accepted knowledge. For example, data-driven neural modeling is a sort of inductive process. One builds a model from measurements and implicit accepted knowledge about neural biophysics – which generally comes with an astounding number of implicit hypotheses and approximations, e.g. electrotonic compactness or the idea that ionic channel properties are similar across cells and related species. The model accounts for the set of measurements, but it also predicts responses in an infinite number of situations. In my view, induction is the weakest form of theoretical process because there is no attempt to go beyond the data. Empirical data are seen as a series of unconnected weather observations that just need to be included in the already existing theory.

What is computational neuroscience? (V) A side note on Paul Feyerabend

Paul Feyerabend was a philosopher of science who defended an anarchist view of science (in his book “Against Method”). That is, he opposed the idea that there should be methodologies imposed in science, because he considered that these are the expression of conservatism. One may not agree with all his conclusions (some think of him as defending relativistic views), but his arguments are worth considering. By looking at the Copernican revolution, Feyerabend makes a strong case that the methodologies proposed by philosophers (e.g. falsificationism) have failed both as a description of scientific activity and as a prescription of "good" scientific activity. That is, in the history of science, new theories that ultimately replace established theories are initially in contradiction with established scientific facts. If they had been judged by the standards of falsificationism for example, they would have been immediately falsified. Yet the Copernican view (the Earth revolves around the sun) ultimately prevailed on the Ptolemaic system (the Earth is at the center of the universe). Galileo firmly believed in heliocentrism not because of empirical reasons (it did not explain more data) but because it “made more sense”, that is, it seemed like a more elegant explanation of the apparent trajectories of planets. See e.g. the picture below (taken from Wikipedia) showing the motion of the Sun, the Earth and Mars in both systems:

It appears clearly in this picture that there is no more empirical content in the heliocentric view, but it seems more satisfactory. At the time though, heliocentrism could be easily disproved with simple arguments, such as the tower argument: when a stone falls from the top of a tower, it falls right beneath it, while it should be “left behind” if the Earth were moving. This is a solid empirical fact, easily reproducible, which falsifies heliocentrism. It might seem foolish to us today, but it does so only because we know that the Earth moves. If we look again at the picture above, we see two theories that both account for the apparent trajectories of planets, but the tower argument corroborates geocentrism while it falsifies heliocentrism. Therefore, so Feyerabend concludes, scientific methodologies that are still widely accepted today (falsificationism) would immediately discard heliocentrism. It follows that these are not only a poor description of how scientific theories are made, but they are also a dangerous prescription of scientific activity, for they would not allow the Copernican revolution to occur.

Feyerabend then goes on to argue that the development of new theories follow a counter-inductive process. This, I believe, is a very deep observation. When a new theory is introduced, it is initially contradictory with a number of established scientific facts, such as the tower argument. Therefore, the theory develops by making the scientific facts agree with the theory, for example by finding an explanation for the fact that the stone falls right beneath the point where it was dropped. Note that these explanations may take a lot of time to be made convincingly, and that they do not constitute the core of the theory. This stands in sharp contrast with induction, in which a theory is built so as to account for the known facts. Here it is the theory itself (e.g. a philosophical principle) that is considered true, while the facts are re-interpreted so as to agree with it.

I want to stress that these arguments do not support relativism, i.e., the idea that all scientific theories are equally valid, depending on the point of view. To make this point clearly, I will make an analogy with a notion that is familiar to physicists, energy landscape:

This is very schematic but perhaps it helps making the argument. In the picture above, I represent on the vertical axis the amount of disagreement between a theory (on the horizontal axis) and empirical facts. This disagreement could be seen as the “energy” that one wants to minimize. The standard inductive process consists in incrementally improving a theory so as to minimize this energy (a sort of “gradient descent”). This process may stabilize into an established theory (the “current theory” in the picture). However, it is very possible that a better theory, empirically speaking, cannot be developed by this process, because it requires a change in paradigm, something that cannot be obtained by incremental changes to the established theory. That is, there is an “energy barrier” between the two theories. Passing through this barrier requires an empirical regression, in which the newly introduced theory is initially worse than the current theory in accounting for the empirical facts.

This analogy illustrates the idea that it can be necessary to temporarily deviate from the empirical facts so as to ultimately explain more of them. This does not mean that empirical facts do not matter, but simply that explaining more and more empirical facts should not be elevated to the rank of “the good scientific methodology”. There are other scientific processes that are both valid as methodologies and necessary for scientific progress. I believe this is how the title of Feyerabend’s book, “Against Method”, should be understood.

What is computational neuroscience? (IV) Should theories explain the data?

Since there is such an obvious answer, you might anticipate that I am going to question it! More precisely, I am going to analyze the following statement: a good theory is one that explains the maximum amount of empirical data while being as simple as possible. I will argue that 1) this is not stupid at all, but that 2) it cannot be a general criterion to distinguish good and bad theories, and finally that 3) it is only a relevant criterion for orthodox theories, i.e., theories that are consistent with theories that produced the data. The arguments are not particularly original, I will mostly summarize points made by a number of philosophers.

First of all, given a finite set of observations, there are an infinite number of universal laws that agree with the observations, so the problem is undetermined. This is the skeptic criticism of inductivism. Which theory to choose then? One approach is "Occam's razor", i.e., the idea that among competing hypotheses, the most parsimonious one should be preferred. But of course, Karl Popper and others would argue that it cannot be a valid criterion to distinguish between theories, because it could still be that the more complex hypothesis predicts future observations better than the simpler hypothesis - there is just no way to know without doing the new experiments. Yet it is not absurd as a heuristic to develop theories. This is a known fact in the field of machine learning for example, related to the problem of "overfitting". If one wants to describe the relationship between two quantities x and y, from a set of n examples (xi,yi), one could perfectly fit an nth-order polynomial to the data. It would completely explain the data, but yet would be very unlikely to fit a new example. In fact, a lower-dimensional relationship is more likely to account for new data, and this can be shown more rigorously with the tools of statistical learning theory. Thus there is a trade-off between how much of the data is accounted for and the simplicity of the theory. So, Occam's Razor is actually a very sensible heuristic to produce theories. But it should not be confused with a general criterion to discard theories.

The interim conclusion is: a theory should account for the data, but not at the expense of being as complicated as the data itself. Now I will make criticisms that are deeper, and mostly based on post-Popper philosophers such as Kuhn, Lakatos and Feyerabend. In a nutshell, the argument is that insisting that a theory should explain empirical data is a kind of inversion of what science is about. Science is about understanding the real world, by making theories and testing them with carefully designed experiments. These experiments are usually done using conditions that are very unecological, and this is justified by the fact that they are designed to test a specific hypothesis in a controlled way. For example, the laws of mechanics would be tested in conditions where there is no friction, a condition that actually almost never happens in the real world - and this is absolutely fine methodology. But then insisting that a new theory should be evaluated by how much it explains the empirical data is what I would call the "empiricist inversion": empirical data were produced, using very peculiar conditions justified by the theory that motivated the experiments, and now we demand that any theory should explain this data. One obvious point, which was made by Kuhn and Feyerabend, is that it gives a highly unfair advantage to the first theory, just because it was there first. But it is actually worse than this, because it also means that the criterion to judge theories is now disconnected from what was meant to be explained in the first place by the theory that produced the data. Here is the empiricist inversion: we consider that theories should explain data, when actually data is produced to test theories. What a theory is meant to explain is the world; data is only used as a methodological tool to test theories of the world.

In summary, this criterion then tends to produce theories of data, not theories of the world. This point in fact relates to the arguments of Gibson, who criticized psychological research for focusing on laboratory stimuli rather than ecological conditions. Of course simplified laboratory stimuli are used to control experiments precisely, but it should always be kept in mind that these simplified stimuli are used as methodological tools and not as the things that are meant to be explained. In neural modeling, I find that many models are developed to explain experimental data, ignoring the function of the models (i.e., the “computational level” in Marr’s analysis framework). In my view, this is characteristic of the empiricist inversion, which results in models of the data, not models of the brain.

At this point, my remarks might start being confusing. On one hand I am saying that it is a good idea to try to account for the data with a simple explanation, on the other hand I am saying that we should not care so much about the data. These seemingly contradictory statements can still make sense because they apply to different types of theories. This is related to what Thomas Kuhn termed “normal science” and “revolutionary science”. These terms might sound a bit too judgmental so I will rather speak of “orthodox theories” and “non-orthodox theories”. The idea is that science is structured by paradigm shifts. Between such shifts, a central paradigm dominates. Data are obtained through this paradigm, anomalies are also explained through this paradigm (rather than being seen as falsifications), and a lot of new scientific results are produced by “puzzle solving”, i.e., trying to explain data. At some point, for various reasons (e.g. too many unexplained anomalies), the central paradigm shifts to a new one and the process starts again, but with new data, new methods, or new ways to look at the observations.

“Orthodox theories” are theories developed within the central paradigm. These try to explain the data obtained with this paradigm, the “puzzle-solving” activity. Here it makes sense to consider that a good theory is a simple explanation of the empirical data. But this kind of criterion cannot explain paradigm shifts. A paradigm shift requires the development of non-orthodox theories, for which the existing empirical data may not be adequate. Therefore the making of non-orthodox theories follows a different logic. Because the existing data were obtained with a different paradigm, these theories are not driven by the data, although they may be motivated by some anomalous set of data. For example they may be developed from philosophical considerations or by analogy. The logic of their construction might be better described by counter-induction rather than induction (a concept proposed by Feyerabend). That is, their development starts from a theoretical principle, rather than from data, and existing data are deconstructed so as to fit the theory. By this process, implicit assumptions of the central paradigm are uncovered, and this might ultimately trigger new experiments and produce new experimental data that may be favorable to the new theory.

Recently, there have been a lot of discussions in the fields of neuroscience and computational neuroscience about the availability of massive amounts of data. Many consider it as a great opportunity, which should change the way we work and build models. It certainly seems like a good thing to have more data, but I would like to point out that it mostly matters for the development of orthodox theories. Putting too much emphasis (and resources) on it also raises the danger of driving the field away from non-orthodox theories, which in the end are the ones that bring scientific revolutions (with the caveat that of course most non-orthodox theories turn out to be wrong). Being myself unhappy with current orthodox theories in neuroscience, I see this danger as quite significant.

This was a long post and I will now try to summarize. I started with the provocative question: should a theory explain the data? First of all, a theory that explains every single bit of data is an enumeration of data, not a theory. It is unlikely to predict any new significant fact. This point is related to overfitting or the “curse of dimensionality” in statistical learning. A better theory is one that explains a lot of the data with a simple explanation, a principle known as Occam’s razor. However, this criterion should be thought of as a heuristic to develop theories, not a clear-cut general decision criterion between theories. In fact, this criterion is relevant mostly for orthodox theories, i.e., those theories that follow the central paradigm with which most data have been obtained. Non-orthodox theories, on the other hand, cannot be expected to explain most of the data obtained through a different paradigm (at least initially). It can be seen that in fact they are developed through a counter-inductive process, by which data are made consistent with the theory. This process may fail to produce new empirical facts consistent with the new theory (most often) or it may succeed and subsequently become the new central paradigm - but this is usually a long process.

The intelligence of slime molds

Slime molds are fascinating: these are unicellular organisms that can display complex behaviors such as finding the shortest path in a maze and developing an efficient transportation network. Actually each of these two findings generated a high-impact publication (Science and Nature) and an Ignobel prize. In the latter study, the authors grew a slime mold on a map of Japan, with food on the biggest cities, and demonstrated that it developed a transportation network that looked very much like the railway network of Japan (check out the video!).

More recently, there was a recent PNAS paper in which the authors showed that a slime mold can solve the “U-shaped trap problem”. This is a classic spatial navigation problem in robotics: the organism is behind a U-shaped barrier and there is food behind it. It cannot navigate to the food using local rules (e.g. following a path along which the distance to the food continuously decreases), and therefore it requires some form of spatial memory. This is not a trivial task for robots, but the slime mold can do it (check out the video).

What I find particularly interesting is that the slime mold has no brain (it is a single cell!), and yet it displays behavior that requires some form of spatial memory. The way it manages to do the task is that it leaves extracellular slime behind it and uses it to mark the locations it has already visited. It can then explore its environment by avoiding extracellular slime, and it can go around the U-shaped barrier. Thus it uses an externalized memory. This is a concrete example that shows that (neural) representation is not always necessary for complex cognition. It nicely illustrates Rodney Brook’s famous quote: “The world is its own best model”. That is, why develop a complex map of the external world when you can directly interact with it?

Of course, we humans don’t usually leave slime on the floor to help us navigate. But this example should make us think about the nature of spatial memory. We tend to think of spatial memory in terms of maps, in analogy with actual maps that we can draw on a paper. However, it is now possible to imagine other ways in which a spatial memory could work, in analogy with the slime mold. For example, one might imagine a memory system that leaves “virtual slime” in places that have been already explored, that is, that associates environmental cues about location with a “slime signal”. This would confer the same navigational abilities as those of slime molds, without a map-like representation of the world. For the organism, having markers in the hippocampus (the brain area involved in spatial memory) or outside the skull might not make a big difference (does the mind stop at the boundary of the skull?).

It is known that in mammals, there are cells in the hippocampus that fire at a specific (preferred) location. These are called “place cells”. How about if the meaning of spikes fired by these place cells were that there is “slime” in their favorite place? Of course I realize that this is a provocative question, which might not go so well with other known facts about the hippocampus, such as grid cells (cells that fire when the animal is at nodes of a regular spatial grid). But it makes the point that maps, in the usual sense, may not be the only way in which these experimental observations can be interpreted. That is, the neural basis of spatial memory could be thought of as operational (neurons fire to trigger some behavior) rather than representational (the world is reconstructed from spike trains).

Rate vs. timing (X) Rate theories in spiking network models

According to the rate-based hypothesis, 1) neural activity can be entirely described by the dynamics of underlying firing rates and 2) firing is independent between neurons, conditionally to these rates. This hypothesis can be investigated in models of spiking neural networks by a self-consistency strategy. If all inputs to a neuron are independent Poisson processes, then the output firing rate can be calculated as a function of input rates. Rates in the network are then solutions of a fixed point equation. This has been investigated in random networks in particular by Nicolas Brunel. In a statistically homogeneous network, theory gives the stationary firing rate, which can be compared to numerical simulations. The approach has also been applied to calculate self-sustained oscillations (time-varying firing rates) in such networks. In general, theory works nicely for sparse random networks, in which a pair of neurons is connected with a low probability. Sparseness implies that there are no short cycles in the connectivity graph, so that the fact that the inputs to a neuron and its output are strongly dependent has little impact on the dynamics. Results of simulations diverge from theory when the connection probability increases. This means that the rate-based hypothesis is not true in general. On the contrary, it relies on specific hypotheses.

Real neural networks do not look like random sparse networks, for example they can be strongly connected locally, neurons can be bidirectionally connected or form clusters. Recently, there have been a number of nice theoretical papers on densely connected balanced networks (Renart et al., Science 2010; Litwin-Kumar and Doiron, Nat Neurosci 2012), which a number of people have interpreted as supporting rate-based theories. In such networks, when inhibition precisely counteracts excitation, excitatory correlations (due to shared inputs) are cancelled by the coordination between inhibition and excitation. As a result, there are very weak pairwise correlations between neurons. I hope it is now clear from my previous posts that this is not an argument in favor of rate-based theories. The fact that correlations are small says nothing about whether dynamics can be faithfully described by underlying time-varying rates.

In fact, in such networks, neurons are in a fluctuation-driven regime, meaning that they are highly sensitive to coincidences. What inhibition does is to cancel the correlations due to shared inputs, i.e., the meaningless correlations. But this is precisely what one would want the network to do in spike-based schemes based on stimulus-specific synchrony (detecting coincidences that are unlikely to occur by chance) or on predictive coding (firing when there is a discrepancy between input and prediction). In summary, these studies do not support the idea that rates are an adequate basis for describing network dynamics. They show how it is possible to cancel expected correlations, a useful mechanism in both rate-based and spike-based theories.

Update. These observations highlight the difference between correlation and synchrony. Correlations are meant as temporal averages, for example pairwise cross-correlation. But on a timescale relevant to behavior, temporal averages are irrelevant. What might be relevant are spatial averages. Thus, synchrony is generally meant as the fact that a number of neurons fire at the same time, or a number of spikes arrive at the same time at a postsynaptic neuron. This is a transient event, which may not be repeated. A single event is meaningful if such synchrony (possibly involving many neurons) is unlikely to occur by chance. The terms “by chance” refer to what could be expected given the past history of spiking events. This is precisely what coordinated inhibition may correspond to in the scheme described above: the predicted level of input correlations. In this sense, inhibition can be tuned to cancel the expected correlations, but by definition it cannot cancel coincidences that are not expected. Thus, the effect of such an excitation-inhibition coordination is precisely to enhance the salience of unexpected synchrony.

Rate vs. timing (IX) The fluctuation-driven regime and the Softky-Shadlen debate

In the 1990s, there was a famous published exchange about the rate vs. timing debate, between Softky and Koch on one side, and Shadlen and Newsome on the other side. Softky and Koch argued that if spike trains were random, as they seemed to be in single unit recordings, and cortical neurons sum many inputs, then by the law of large numbers their output should be regular, since the total input would be approximately constant. Therefore, so they argued, there is an inconsistency in the two hypotheses (independence of inputs and integration). They proposed to resolve it by postulating that neurons do not sum their inputs but rather detect coincidences at a millisecond timescale, using dendritic nonlinearities. Shadlen and Newsome demonstrated that the two hypotheses are in fact not contradictory, if one postulates that the total mean input is subthreshold, so that spikes only occur when the total input fluctuates above its average. This is called the “fluctuation-driven regime”, and it is a fairly well accepted hypothesis nowadays. When there are many inputs, this can happen when excitation is balanced by inhibition, hence the other standard name “balanced regime” (note that balanced implies fluctuation-driven, but not the other way round). An electrophysiological signature of this regime is a distribution of membrane potential that peaks well below threshold (instead of monotonically increasing towards threshold).

In the fluctuation-driven regime, output spikes occur irregularly, because the neuron only spikes when there is a fluctuation of the summed input. Thus the two hypotheses are not contradictory: it is completely possible that a neuron receives independent Poisson inputs, integrates them, and fires in a quasi-Poisson way. This argument indeed makes the submillisecond coincidence detection hypothesis unnecessary. However, Softky then correctly argued that even then, output spikes are still determined by input spikes, so they cannot be seen as random. To be more precise: input spike trains are independent Poisson processes, the output spike train is (approximately) a Poisson process, but inputs and outputs are not independent. In their reply, Shadlen and Newsome miss this argument. They show that if they replay the same pattern of spikes to the neuron that led to a spike, but with a different history of inputs, then the neuron may not spike. This happened in their model for two reasons: 1) they used a variation of the perfect integrator, a very particular kind of model that is known to be unreliable, contrary to almost every other spiking neuron model, and to actual neurons (Brette & Guigon 2003), 2) they considered a pattern of input spikes restricted to a window much shorter than the integration time constant of the neuron. If they had played a pattern covering one integration time window to a standard integrate-and-fire model (or any other model), then they would have seen output spikes. But perhaps more importantly, even if the input pattern is restricted either temporally or to a subset of synapses, the probability that the neuron fires is much higher than chance. In other words, the output spike train is not independent of any of the input spike trains. This would appear in a cross-correlogram between any input and the output, as an extra firing probability at positive lags, on the timescale of the integration time constant, with a correlation of order 1/N (since there is 1 output spike for N input spikes, assuming identical rates).

Note that this is a trivial mathematical fact, if the output depends deterministically on the inputs. Yet, it is a critical point in the debate. Consider: here is an elementary example in which all inputs are independent Poisson processes with the same constant firing rate, and the output is also a (quasi-) Poisson process with constant rate. But the fact that one input neuron spikes is informative about whether the output neuron will spike shortly after, conditionally to the knowledge of the rate of the first neuron. In other words, rates do not fully describe the (joint) activity of the network. This is a direct contradiction of the rate-based postulate.

Even though this means that the rate-based hypothesis is mathematically wrong (at least in this case), it may still be that it is a good enough approximation. If one input spike is known, one gets a little bit of extra information about whether the output neuron spikes, compared to the sole knowledge of the rates. Maybe this is a slight discrepancy. But consider: if all input spikes are known, one gets full information about the output spikes, since the process is deterministic and reliable. This is a very strong discrepancy with the rate-based hypothesis. One may ask the question: if I observe p input spikes occurring together, how much can I predict about output spiking? This is the question we tried to answer in Rossant et al. (2011), and it follows an argument proposed by Abeles in the 1980s. In a fluctuation-driven regime, if one observes just one input spike, chances are that the membrane potential is far from threshold, and the neuron is very unlikely to fire. But if, say, 10 spikes are observed, each producing a 1 mV depolarization and the threshold is about 10 mV about the mean potential, then there is 50% chance of observing an output spike. Abeles called the ratio between the extra firing produced by 10 independent spikes and by 10 coincident spikes the “coincidence advantage”, and it is a huge number. Consider again: if you only know the input rates, then there is a 5% chance of observing a spike in a 10 ms window, for an output neuron firing at 5 Hz; if you additionally know that 10 spikes have been fired, then there is a 50% chance of observing an output spike. This is a huge change, involving the observation of just 0.1% of all synapses (assuming 10,000 synapses).

Thus, it is difficult to argue here that rates entirely determine the activity of the network. Simply put, the fact that the input-output function of neurons is essentially deterministic introduces strong correlations between input and output spike trains. It is a simple fact, and it is well known in the theoretical literature about neural network dynamics. For example, one line of research, initiated mainly by Nicolas Brunel, tries to determine the firing rates (average and time-varying) of networks of spiking models, using a self-consistent analysis. It is notably difficult to do this in general in the fluctuation-driven regime, because of the correlations introduced by the spiking process. To solve it, the standard hypothesis is to consider sparse networks with a random connectivity. This ensures that there is no short cycle in the connectivity graph, and therefore that inputs to a given neuron are approximately independent. But the theoretical predictions break when this hypothesis is not satisfied. It is in fact a challenge in theoretical neuroscience to extend this type of analysis to networks with realistic connectivity – i.e., with short cycles and non-random connectivity.

It is interesting to note that the concept of the balanced or fluctuation-driven regime was proposed in the 1990s as a way to support rate-based theories. In fact, analysis shows that it is specifically in this regime, and not in the mean-driven regime, that 1) neurons are essentially deterministic, 2) neurons are highly sensitive to the relative timing of input spikes, 3) there is a strong coordination between input and output spikes. The rate-based postulate is not valid at all in this regime.