What is computational neuroscience? (VII) Incommensurability and relativism

I explained in previous posts that new theories should not be judged by their agreement with the current body of empirical data, because these data were produced by the old theory. In the new theory, they may be interpreted very differently or even considered irrelevant. A few philosophers have gone so far as to state that different theories are incommensurable, that is, they cannot be compared with each other because they have different logics (e.g. observations are not described in the same way in the different theories). This reasoning may lead to relativistic views of science, that is, the idea that all theories are equally “good” and that their choice are a matter of personal taste or fashion. In this post I will try to explain the arguments, and also to discard relativism.

In “Against Method”, Feyerabend explains that scientific theories are defined in a relational way, that is, elements of a theory make sense only in reference to other elements of the theory. I believe this is a very deep remark that applies to theories of knowledge in the broadest sense, including perception for example. Below, I drew a schematic figure to illustrate the arguments.

Theories are systems of thought that relate to the world. Concepts in a theory are meant to relate to the world, and they are defined with respect to other concepts in the theory. A given concept in a given theory may have a similar concept in another theory, but it is a different concept, in general. To explain his arguments, Feyerabend uses the analogy of language. It is a good analogy because languages relate to the world, and they have an internal relational structure. Imagine theories A and B are two languages. A word in language A is defined (e.g. in the dictionary) by using other words from language A. A child learns her native language by picking up the relationship between the words, and how they relate to the world she can see. To understand language A, a native speaker of language B may translate the words. However, translation is not definition. It is imprecise because the two words often do not have exactly the same meaning in both languages. Some words may not even exist in one language. A deeper understanding of language A requires to go beyond translation, and to capture the meaning of words by acquiring a more global understanding of the language, both in its internal structure and in its relationship with the world.

Another analogy one could make is political theories, in how they view society. Clearly, a given observation can be interpreted in opposite ways in conservative and liberal political views. For example, the same economic crisis could be seen as the result of public debt or as the result of public cuts in spending (due to public acquisition of private debt).

These analogies support the argument that an element of a new theory may not be satisfactorily explained in the framework of the old theory. It may only make full sense when embedded in the full structure of the new theory – which means that new theories may be initially unclear and that the concepts may not be well defined. This remark can certainly make different theories difficult to compare, but I would not conclude that theories are incommensurable. This conclusion would be valid if theories were closed systems, because then a given statement would make no sense elsewhere than in the context of the theory in which it is formulated. Axiomatic systems in mathematics could be said to be incommensurable (for example, Euclidian and non-Euclidian geometries). But theories of knowledge, unlike axiomatic systems, are systems that relate to the world, and the world is shared between different theories (as illustrated in the drawing above). For this reason, translation is imprecise but not arbitrary, and one may still assess the degree of consistency between a scientific theory and the part of the world it is meant to explain.

One may find an interesting example in social psychology. In the theory of cognitive dissonance, new facts that seem to contradict our belief system are taken into account by minimally adjusting that belief system (minimizing the “dissonance” between the facts and the theory). In philosophy of knowledge, these adjustments would be called “ad hoc hypotheses”. When it becomes too difficult to account for all the contradictory facts (making the theory too cumbersome), the belief system may ultimately collapse. This is very similar to the theory of knowledge defended by Imre Lakatos, where belief systems are replaced by research programs. Cognitive dissonance theory was introduced by a field study in a small American sect who believed that the end of the world would occur at a specific date (Festinger, Riecken and Schachter (1956), When Prophecy Fails. University of Minnesota Press). When the said date arrived and the world did not end, strangely enough, the sect did not collapse. On the contrary, it made it stronger, with the followers more firmly believing in their view of the world. They considered that the world did not end because they prayed so much and God heard their prayers and postponed the event. So they made a new prediction, which of course turned out to be false. The sect ultimately collapsed, although only after a surprisingly long time.

The example illustrates two points. Firstly, a theory does not collapse because one prediction is falsified. Instead, the theory is adjusted with a minor modification so as to account for the seemingly contradicting observation. But this process does not go on forever, because of its interaction with the world: when predictions are systematically falsified, the theory ultimately loses its followers, and for a good reason.

In summary, a theory of knowledge is a system in interaction with the world. It has an internal structure, and it also relates to the world. And although it may relate to the world in its own words, one may still assess the adequacy of this relationship. For this reason, one may not defend scientific relativism in its strongest version.

For the reader of my other posts in this blog, this definition of theories of knowledge might sound familiar. Indeed it is highly related to theories of perception defended by Gibson, O’Regan and Varela, for example. After all, perception is a form of knowledge about the world. These authors have in common that they define perception in a relational way, the relationship between the actions of the organism in the world (driven by “theory”) and the effects of these actions on the organism (“tests” of the theory). This is in contrast with “neurophysiological subjectivism”, for which meaning is intrinsically produced by the brain (a closed system, in my drawing above) and “computational objectivism”, in which there is a pre-defined objective world (related to the idea of translation).

What is computational neuroscience? (VI) Deduction, induction, counter-induction

At this point, it should be clear that there is not a single type of theoretical work. I believe most theoretical work can be categorized into three broad classes: deduction, induction, and counter-induction. Deduction is deriving theoretical knowledge from previous theoretical knowledge, with no direct reference to empirical facts. Induction is the process of making a theory that accounts for the available empirical data, in general in a parsimonious way (Occam’s razor). Counter-induction is the process of making a theory based on non-empirical considerations (for example philosophical principles or analogy) or on a subset of empirical observations that are considered significant, and re-interpreting empirical facts so that they agree with the new theory. Note that 1) all these processes may lead to new empirical predictions, 2) a given line of research may use all three types of processes.

For illustration, I will discuss the work done in my group on the dynamics of spike threshold (see these two papers with Jonathan Platkiewicz: “ A Threshold Equation for Action Potential Initiation” and “Impact of Fast Sodium Channel Inactivation on Spike Threshold Dynamics and Synaptic Integration”). It is certainly not the most well-known line of research and therefore it will require some explanation. However, since I know it so well, it will be easier to highlight the different types of theoretical thinking – I will try to show how all three types of processes were used.

I will first briefly summarize the scientific context. Neurons communicate with each other by spikes, which are triggered when the membrane potential reaches a threshold value. It turns out that, in vivo, the spike threshold is not a fixed value even within a given neuron. Many empirical observations show that it depends on the stimulation, and on various aspects of the previous activity of the neuron, e.g. its previous membrane potential and the previously triggered spikes. For example, the spike threshold tends to be higher when the membrane potential was previously higher. By induction, one may infer that the spike threshold adapts to the membrane potential. One may then derive a first-order differential equation describing the process, in which the threshold adapts to the membrane potential with some characteristic time constant.  Such phenomenological equations have been proposed in the past by a number of authors, and it is qualitatively consistent with a number of properties seen in the empirical data. But note that an inductive process can only produce a hypothesis. The data could be explained by other hypotheses. For example, the threshold could be modulated by an external process, say inhibition targeted at the spike initiation site, which would co-vary with the somatic membrane potential. However, the hypothesis could potentially be tested. For example, an experiment could be done in which the membrane potential is actively modified by an electrode injecting current: if threshold modulation is external, spike threshold should not be affected by this perturbation. So an inductive process can be a fruitful theoretical methodology.

In our work with Jonathan Platkiewicz, we started from this inductive insight, and then followed a deductive process. The biophysics of spike initiation is described by the Hodgkin-Huxley equations. Hodgkin and Huxley got the Nobel prize in 1963 for showing how ionic mechanisms interact to generate spikes in the squid giant axons. They used a quantitative model (four differential equations) that they fitted to their measurements. They were then able to accurately predict the velocity of spike propagation along the axon. As a side note, this mathematical model, which explicitly refers to ionic channels, was established much before these channels could be directly observed (by Neher and Sakmann, who then also got the Nobel prize in 1991). Thus this discovery was not data-driven at all, but rather hypothesis-driven.

In the Hodgkin-Huxley model, spikes are initiated by the opening of sodium channels, which let a positive current enter the cell when the membrane potential is high enough, triggering a positive feedback process. These channels also inactivate (more slowly) when the membrane potential increases, and when they inactivate the spike threshold increases. This is one mechanism by which the spike threshold can adapt to the membrane potential. Another way, in the Hodgkin-Huxley equations, is by the opening of potassium channels when the membrane potential increases. In this model, we then derived an equation describing how the spike threshold depends on these ionic channels, and then a differential equation describing how it evolves with the membrane potential. This is a purely deductive process (which also involves approximations), and it also predicts that the spike threshold adapts to the membrane potential. Yet it provides new theoretical knowledge, compared to the inductive process. First, it shows that threshold adaptation is consistent with Hodgkin-Huxley equations, an established biophysical theory. This is not so surprising, but given that other hypotheses could be formulated (see e.g. the axonal inhibition hypothesis I mentioned above), it strengthens this hypothesis. Secondly, it shows under what conditions on ionic channel properties the theory can be consistent with the empirical data. This provides new ways to test the theory (by measuring ionic channel properties) and therefore increases its empirical content. Thirdly, the equation we proposed is slightly different from those previously proposed by induction. That is, the theory predicts that the spike threshold only adapts above a certain potential, otherwise it is fixed. This is a prediction that is not obvious from the published data, and therefore could not have been made by a purely inductive process. Thus, a deductive process is also a fruitful theoretical methodology, even though it is in some sense “purely theoretical”, that is, accounting for empirical facts is not part of the theory-making process itself (except for motivating the work).

In the second paper, we also used a deductive process to understand what threshold adaptation implies for synaptic integration. For example, we show that incoming spikes interact at the timescale of threshold adaptation, rather than of the membrane time constant. Note how the goal of this theoretical work now is not to account for empirical facts or explain mechanisms, but to provide a new interpretative framework for these facts. The theory redefines what should be considered significant – in this case, the distance to threshold rather than the absolute membrane potential. This is an important remark, because it implies that theoretical work is not only about making new experimental predictions, but also about interpreting experimental observations and possibly orienting future experiments.

We then concluded the paper with a counter-inductive line of reasoning. Different ionic mechanisms may contribute to threshold adaptation, in particular sodium channel inactivation and potassium channel activation. We argued that the former was more likely, because it is more energetically efficient (the latter requires both sodium and potassium channels to be open and counteract each other, implying considerable ionic traffic). This argument is not empirical: it relies on the idea that neurons should be efficient based on evolutionary theory (a theoretical argument) and on the fact that the brain has been shown to be efficient in many other circumstances (an argument by analogy). It is not based on empirical evidence, and worse, it is contradicted by empirical evidence. Indeed, blocking Kv1 channels abolishes threshold dynamics. I then reason counter-inductively to make my theoretical statement compatible with this observation. I first note that removing the heart of a man prevents him from thinking, but it does not imply that thoughts are produced by the heart. This is an epistemological argument (discarding the methodology as inappropriate). Secondly, I was told by a colleague (unpublished observation) that suppressing Kv1 moves the spike initiation site to the node of Ranvier (discarding the data as being irrelevant or abnormal). Thirdly, I can quantitatively account for the results with our theory, by noting that suppressing any channel can globally shift the spike threshold and possibly move the minimum threshold below the half-inactivation voltage of sodium channels, in which case there is no more threshold variability. These are three counter-inductive arguments that are perfectly reasonable. One might not be convinced by them, but they cannot be discarded as being intrinsically wrong. Since it is possible that I am right, counter-inductive reasoning is a useful scientific methodology. Note also how counter-inductive reasoning can suggest new experiments, for example testing whether suppressing Kv1 moves the initiation site to the node of Ranvier.

In summary, there are different types of theoretical work. They differ not so much in content as in methodology: deduction, induction and counter-induction. All three types of methodologies are valid and fruitful, and they should be recognized as such, noting that they have different logics and possibly different aims.

 

Update. It occurred to me that I use the word “induction” to refer to the making of a law from a series of observations, but it seems that this process is often subdivided in two different processes, induction and abduction. In this sense, induction is the making of a law from a series of observations in the sense of “generalizing”: for example, reasoning by analogy or fitting a curve to empirical data. Abduction is the finding of a possible underlying cause that would explain the observations. Thus abduction is more creative and seems more uncertain: it is the making of a hypothesis (among other possible hypotheses), while induction is rather the direct generalization of empirical data together with accepted knowledge. For example, data-driven neural modeling is a sort of inductive process. One builds a model from measurements and implicit accepted knowledge about neural biophysics – which generally comes with an astounding number of implicit hypotheses and approximations, e.g. electrotonic compactness or the idea that ionic channel properties are similar across cells and related species. The model accounts for the set of measurements, but it also predicts responses in an infinite number of situations. In my view, induction is the weakest form of theoretical process because there is no attempt to go beyond the data. Empirical data are seen as a series of unconnected weather observations that just need to be included in the already existing theory.

What is computational neuroscience? (V) A side note on Paul Feyerabend

Paul Feyerabend was a philosopher of science who defended an anarchist view of science (in his book “Against Method”). That is, he opposed the idea that there should be methodologies imposed in science, because he considered that these are the expression of conservatism. One may not agree with all his conclusions (some think of him as defending relativistic views), but his arguments are worth considering. By looking at the Copernican revolution, Feyerabend makes a strong case that the methodologies proposed by philosophers (e.g. falsificationism) have failed both as a description of scientific activity and as a prescription of "good" scientific activity. That is, in the history of science, new theories that ultimately replace established theories are initially in contradiction with established scientific facts. If they had been judged by the standards of falsificationism for example, they would have been immediately falsified. Yet the Copernican view (the Earth revolves around the sun) ultimately prevailed on the Ptolemaic system (the Earth is at the center of the universe). Galileo firmly believed in heliocentrism not because of empirical reasons (it did not explain more data) but because it “made more sense”, that is, it seemed like a more elegant explanation of the apparent trajectories of planets. See e.g. the picture below (taken from Wikipedia) showing the motion of the Sun, the Earth and Mars in both systems:

It appears clearly in this picture that there is no more empirical content in the heliocentric view, but it seems more satisfactory. At the time though, heliocentrism could be easily disproved with simple arguments, such as the tower argument: when a stone falls from the top of a tower, it falls right beneath it, while it should be “left behind” if the Earth were moving. This is a solid empirical fact, easily reproducible, which falsifies heliocentrism. It might seem foolish to us today, but it does so only because we know that the Earth moves. If we look again at the picture above, we see two theories that both account for the apparent trajectories of planets, but the tower argument corroborates geocentrism while it falsifies heliocentrism. Therefore, so Feyerabend concludes, scientific methodologies that are still widely accepted today (falsificationism) would immediately discard heliocentrism. It follows that these are not only a poor description of how scientific theories are made, but they are also a dangerous prescription of scientific activity, for they would not allow the Copernican revolution to occur.

Feyerabend then goes on to argue that the development of new theories follow a counter-inductive process. This, I believe, is a very deep observation. When a new theory is introduced, it is initially contradictory with a number of established scientific facts, such as the tower argument. Therefore, the theory develops by making the scientific facts agree with the theory, for example by finding an explanation for the fact that the stone falls right beneath the point where it was dropped. Note that these explanations may take a lot of time to be made convincingly, and that they do not constitute the core of the theory. This stands in sharp contrast with induction, in which a theory is built so as to account for the known facts. Here it is the theory itself (e.g. a philosophical principle) that is considered true, while the facts are re-interpreted so as to agree with it.

I want to stress that these arguments do not support relativism, i.e., the idea that all scientific theories are equally valid, depending on the point of view. To make this point clearly, I will make an analogy with a notion that is familiar to physicists, energy landscape:

This is very schematic but perhaps it helps making the argument. In the picture above, I represent on the vertical axis the amount of disagreement between a theory (on the horizontal axis) and empirical facts. This disagreement could be seen as the “energy” that one wants to minimize. The standard inductive process consists in incrementally improving a theory so as to minimize this energy (a sort of “gradient descent”). This process may stabilize into an established theory (the “current theory” in the picture). However, it is very possible that a better theory, empirically speaking, cannot be developed by this process, because it requires a change in paradigm, something that cannot be obtained by incremental changes to the established theory. That is, there is an “energy barrier” between the two theories. Passing through this barrier requires an empirical regression, in which the newly introduced theory is initially worse than the current theory in accounting for the empirical facts.

This analogy illustrates the idea that it can be necessary to temporarily deviate from the empirical facts so as to ultimately explain more of them. This does not mean that empirical facts do not matter, but simply that explaining more and more empirical facts should not be elevated to the rank of “the good scientific methodology”. There are other scientific processes that are both valid as methodologies and necessary for scientific progress. I believe this is how the title of Feyerabend’s book, “Against Method”, should be understood.

What is computational neuroscience? (IV) Should theories explain the data?

Since there is such an obvious answer, you might anticipate that I am going to question it! More precisely, I am going to analyze the following statement: a good theory is one that explains the maximum amount of empirical data while being as simple as possible. I will argue that 1) this is not stupid at all, but that 2) it cannot be a general criterion to distinguish good and bad theories, and finally that 3) it is only a relevant criterion for orthodox theories, i.e., theories that are consistent with theories that produced the data. The arguments are not particularly original, I will mostly summarize points made by a number of philosophers.

First of all, given a finite set of observations, there are an infinite number of universal laws that agree with the observations, so the problem is undetermined. This is the skeptic criticism of inductivism. Which theory to choose then? One approach is "Occam's razor", i.e., the idea that among competing hypotheses, the most parsimonious one should be preferred. But of course, Karl Popper and others would argue that it cannot be a valid criterion to distinguish between theories, because it could still be that the more complex hypothesis predicts future observations better than the simpler hypothesis - there is just no way to know without doing the new experiments. Yet it is not absurd as a heuristic to develop theories. This is a known fact in the field of machine learning for example, related to the problem of "overfitting". If one wants to describe the relationship between two quantities x and y, from a set of n examples (xi,yi), one could perfectly fit an nth-order polynomial to the data. It would completely explain the data, but yet would be very unlikely to fit a new example. In fact, a lower-dimensional relationship is more likely to account for new data, and this can be shown more rigorously with the tools of statistical learning theory. Thus there is a trade-off between how much of the data is accounted for and the simplicity of the theory. So, Occam's Razor is actually a very sensible heuristic to produce theories. But it should not be confused with a general criterion to discard theories.

The interim conclusion is: a theory should account for the data, but not at the expense of being as complicated as the data itself. Now I will make criticisms that are deeper, and mostly based on post-Popper philosophers such as Kuhn, Lakatos and Feyerabend. In a nutshell, the argument is that insisting that a theory should explain empirical data is a kind of inversion of what science is about. Science is about understanding the real world, by making theories and testing them with carefully designed experiments. These experiments are usually done using conditions that are very unecological, and this is justified by the fact that they are designed to test a specific hypothesis in a controlled way. For example, the laws of mechanics would be tested in conditions where there is no friction, a condition that actually almost never happens in the real world - and this is absolutely fine methodology. But then insisting that a new theory should be evaluated by how much it explains the empirical data is what I would call the "empiricist inversion": empirical data were produced, using very peculiar conditions justified by the theory that motivated the experiments, and now we demand that any theory should explain this data. One obvious point, which was made by Kuhn and Feyerabend, is that it gives a highly unfair advantage to the first theory, just because it was there first. But it is actually worse than this, because it also means that the criterion to judge theories is now disconnected from what was meant to be explained in the first place by the theory that produced the data. Here is the empiricist inversion: we consider that theories should explain data, when actually data is produced to test theories. What a theory is meant to explain is the world; data is only used as a methodological tool to test theories of the world.

In summary, this criterion then tends to produce theories of data, not theories of the world. This point in fact relates to the arguments of Gibson, who criticized psychological research for focusing on laboratory stimuli rather than ecological conditions. Of course simplified laboratory stimuli are used to control experiments precisely, but it should always be kept in mind that these simplified stimuli are used as methodological tools and not as the things that are meant to be explained. In neural modeling, I find that many models are developed to explain experimental data, ignoring the function of the models (i.e., the “computational level” in Marr’s analysis framework). In my view, this is characteristic of the empiricist inversion, which results in models of the data, not models of the brain.

At this point, my remarks might start being confusing. On one hand I am saying that it is a good idea to try to account for the data with a simple explanation, on the other hand I am saying that we should not care so much about the data. These seemingly contradictory statements can still make sense because they apply to different types of theories. This is related to what Thomas Kuhn termed “normal science” and “revolutionary science”. These terms might sound a bit too judgmental so I will rather speak of “orthodox theories” and “non-orthodox theories”. The idea is that science is structured by paradigm shifts. Between such shifts, a central paradigm dominates. Data are obtained through this paradigm, anomalies are also explained through this paradigm (rather than being seen as falsifications), and a lot of new scientific results are produced by “puzzle solving”, i.e., trying to explain data. At some point, for various reasons (e.g. too many unexplained anomalies), the central paradigm shifts to a new one and the process starts again, but with new data, new methods, or new ways to look at the observations.

“Orthodox theories” are theories developed within the central paradigm. These try to explain the data obtained with this paradigm, the “puzzle-solving” activity. Here it makes sense to consider that a good theory is a simple explanation of the empirical data. But this kind of criterion cannot explain paradigm shifts. A paradigm shift requires the development of non-orthodox theories, for which the existing empirical data may not be adequate. Therefore the making of non-orthodox theories follows a different logic. Because the existing data were obtained with a different paradigm, these theories are not driven by the data, although they may be motivated by some anomalous set of data. For example they may be developed from philosophical considerations or by analogy. The logic of their construction might be better described by counter-induction rather than induction (a concept proposed by Feyerabend). That is, their development starts from a theoretical principle, rather than from data, and existing data are deconstructed so as to fit the theory. By this process, implicit assumptions of the central paradigm are uncovered, and this might ultimately trigger new experiments and produce new experimental data that may be favorable to the new theory.

Recently, there have been a lot of discussions in the fields of neuroscience and computational neuroscience about the availability of massive amounts of data. Many consider it as a great opportunity, which should change the way we work and build models. It certainly seems like a good thing to have more data, but I would like to point out that it mostly matters for the development of orthodox theories. Putting too much emphasis (and resources) on it also raises the danger of driving the field away from non-orthodox theories, which in the end are the ones that bring scientific revolutions (with the caveat that of course most non-orthodox theories turn out to be wrong). Being myself unhappy with current orthodox theories in neuroscience, I see this danger as quite significant.

This was a long post and I will now try to summarize. I started with the provocative question: should a theory explain the data? First of all, a theory that explains every single bit of data is an enumeration of data, not a theory. It is unlikely to predict any new significant fact. This point is related to overfitting or the “curse of dimensionality” in statistical learning. A better theory is one that explains a lot of the data with a simple explanation, a principle known as Occam’s razor. However, this criterion should be thought of as a heuristic to develop theories, not a clear-cut general decision criterion between theories. In fact, this criterion is relevant mostly for orthodox theories, i.e., those theories that follow the central paradigm with which most data have been obtained. Non-orthodox theories, on the other hand, cannot be expected to explain most of the data obtained through a different paradigm (at least initially). It can be seen that in fact they are developed through a counter-inductive process, by which data are made consistent with the theory. This process may fail to produce new empirical facts consistent with the new theory (most often) or it may succeed and subsequently become the new central paradigm - but this is usually a long process.

The intelligence of slime molds

Slime molds are fascinating: these are unicellular organisms that can display complex behaviors such as finding the shortest path in a maze and developing an efficient transportation network. Actually each of these two findings generated a high-impact publication (Science and Nature) and an Ignobel prize. In the latter study, the authors grew a slime mold on a map of Japan, with food on the biggest cities, and demonstrated that it developed a transportation network that looked very much like the railway network of Japan (check out the video!).

More recently, there was a recent PNAS paper in which the authors showed that a slime mold can solve the “U-shaped trap problem”. This is a classic spatial navigation problem in robotics: the organism is behind a U-shaped barrier and there is food behind it. It cannot navigate to the food using local rules (e.g. following a path along which the distance to the food continuously decreases), and therefore it requires some form of spatial memory. This is not a trivial task for robots, but the slime mold can do it (check out the video).

What I find particularly interesting is that the slime mold has no brain (it is a single cell!), and yet it displays behavior that requires some form of spatial memory. The way it manages to do the task is that it leaves extracellular slime behind it and uses it to mark the locations it has already visited. It can then explore its environment by avoiding extracellular slime, and it can go around the U-shaped barrier. Thus it uses an externalized memory. This is a concrete example that shows that (neural) representation is not always necessary for complex cognition. It nicely illustrates Rodney Brook’s famous quote: “The world is its own best model”. That is, why develop a complex map of the external world when you can directly interact with it?

Of course, we humans don’t usually leave slime on the floor to help us navigate. But this example should make us think about the nature of spatial memory. We tend to think of spatial memory in terms of maps, in analogy with actual maps that we can draw on a paper. However, it is now possible to imagine other ways in which a spatial memory could work, in analogy with the slime mold. For example, one might imagine a memory system that leaves “virtual slime” in places that have been already explored, that is, that associates environmental cues about location with a “slime signal”. This would confer the same navigational abilities as those of slime molds, without a map-like representation of the world. For the organism, having markers in the hippocampus (the brain area involved in spatial memory) or outside the skull might not make a big difference (does the mind stop at the boundary of the skull?).

It is known that in mammals, there are cells in the hippocampus that fire at a specific (preferred) location. These are called “place cells”. How about if the meaning of spikes fired by these place cells were that there is “slime” in their favorite place? Of course I realize that this is a provocative question, which might not go so well with other known facts about the hippocampus, such as grid cells (cells that fire when the animal is at nodes of a regular spatial grid). But it makes the point that maps, in the usual sense, may not be the only way in which these experimental observations can be interpreted. That is, the neural basis of spatial memory could be thought of as operational (neurons fire to trigger some behavior) rather than representational (the world is reconstructed from spike trains).

Rate vs. timing (X) Rate theories in spiking network models

According to the rate-based hypothesis, 1) neural activity can be entirely described by the dynamics of underlying firing rates and 2) firing is independent between neurons, conditionally to these rates. This hypothesis can be investigated in models of spiking neural networks by a self-consistency strategy. If all inputs to a neuron are independent Poisson processes, then the output firing rate can be calculated as a function of input rates. Rates in the network are then solutions of a fixed point equation. This has been investigated in random networks in particular by Nicolas Brunel. In a statistically homogeneous network, theory gives the stationary firing rate, which can be compared to numerical simulations. The approach has also been applied to calculate self-sustained oscillations (time-varying firing rates) in such networks. In general, theory works nicely for sparse random networks, in which a pair of neurons is connected with a low probability. Sparseness implies that there are no short cycles in the connectivity graph, so that the fact that the inputs to a neuron and its output are strongly dependent has little impact on the dynamics. Results of simulations diverge from theory when the connection probability increases. This means that the rate-based hypothesis is not true in general. On the contrary, it relies on specific hypotheses.

Real neural networks do not look like random sparse networks, for example they can be strongly connected locally, neurons can be bidirectionally connected or form clusters. Recently, there have been a number of nice theoretical papers on densely connected balanced networks (Renart et al., Science 2010; Litwin-Kumar and Doiron, Nat Neurosci 2012), which a number of people have interpreted as supporting rate-based theories. In such networks, when inhibition precisely counteracts excitation, excitatory correlations (due to shared inputs) are cancelled by the coordination between inhibition and excitation. As a result, there are very weak pairwise correlations between neurons. I hope it is now clear from my previous posts that this is not an argument in favor of rate-based theories. The fact that correlations are small says nothing about whether dynamics can be faithfully described by underlying time-varying rates.

In fact, in such networks, neurons are in a fluctuation-driven regime, meaning that they are highly sensitive to coincidences. What inhibition does is to cancel the correlations due to shared inputs, i.e., the meaningless correlations. But this is precisely what one would want the network to do in spike-based schemes based on stimulus-specific synchrony (detecting coincidences that are unlikely to occur by chance) or on predictive coding (firing when there is a discrepancy between input and prediction). In summary, these studies do not support the idea that rates are an adequate basis for describing network dynamics. They show how it is possible to cancel expected correlations, a useful mechanism in both rate-based and spike-based theories.

Update. These observations highlight the difference between correlation and synchrony. Correlations are meant as temporal averages, for example pairwise cross-correlation. But on a timescale relevant to behavior, temporal averages are irrelevant. What might be relevant are spatial averages. Thus, synchrony is generally meant as the fact that a number of neurons fire at the same time, or a number of spikes arrive at the same time at a postsynaptic neuron. This is a transient event, which may not be repeated. A single event is meaningful if such synchrony (possibly involving many neurons) is unlikely to occur by chance. The terms “by chance” refer to what could be expected given the past history of spiking events. This is precisely what coordinated inhibition may correspond to in the scheme described above: the predicted level of input correlations. In this sense, inhibition can be tuned to cancel the expected correlations, but by definition it cannot cancel coincidences that are not expected. Thus, the effect of such an excitation-inhibition coordination is precisely to enhance the salience of unexpected synchrony.

Rate vs. timing (IX) The fluctuation-driven regime and the Softky-Shadlen debate

In the 1990s, there was a famous published exchange about the rate vs. timing debate, between Softky and Koch on one side, and Shadlen and Newsome on the other side. Softky and Koch argued that if spike trains were random, as they seemed to be in single unit recordings, and cortical neurons sum many inputs, then by the law of large numbers their output should be regular, since the total input would be approximately constant. Therefore, so they argued, there is an inconsistency in the two hypotheses (independence of inputs and integration). They proposed to resolve it by postulating that neurons do not sum their inputs but rather detect coincidences at a millisecond timescale, using dendritic nonlinearities. Shadlen and Newsome demonstrated that the two hypotheses are in fact not contradictory, if one postulates that the total mean input is subthreshold, so that spikes only occur when the total input fluctuates above its average. This is called the “fluctuation-driven regime”, and it is a fairly well accepted hypothesis nowadays. When there are many inputs, this can happen when excitation is balanced by inhibition, hence the other standard name “balanced regime” (note that balanced implies fluctuation-driven, but not the other way round). An electrophysiological signature of this regime is a distribution of membrane potential that peaks well below threshold (instead of monotonically increasing towards threshold).

In the fluctuation-driven regime, output spikes occur irregularly, because the neuron only spikes when there is a fluctuation of the summed input. Thus the two hypotheses are not contradictory: it is completely possible that a neuron receives independent Poisson inputs, integrates them, and fires in a quasi-Poisson way. This argument indeed makes the submillisecond coincidence detection hypothesis unnecessary. However, Softky then correctly argued that even then, output spikes are still determined by input spikes, so they cannot be seen as random. To be more precise: input spike trains are independent Poisson processes, the output spike train is (approximately) a Poisson process, but inputs and outputs are not independent. In their reply, Shadlen and Newsome miss this argument. They show that if they replay the same pattern of spikes to the neuron that led to a spike, but with a different history of inputs, then the neuron may not spike. This happened in their model for two reasons: 1) they used a variation of the perfect integrator, a very particular kind of model that is known to be unreliable, contrary to almost every other spiking neuron model, and to actual neurons (Brette & Guigon 2003), 2) they considered a pattern of input spikes restricted to a window much shorter than the integration time constant of the neuron. If they had played a pattern covering one integration time window to a standard integrate-and-fire model (or any other model), then they would have seen output spikes. But perhaps more importantly, even if the input pattern is restricted either temporally or to a subset of synapses, the probability that the neuron fires is much higher than chance. In other words, the output spike train is not independent of any of the input spike trains. This would appear in a cross-correlogram between any input and the output, as an extra firing probability at positive lags, on the timescale of the integration time constant, with a correlation of order 1/N (since there is 1 output spike for N input spikes, assuming identical rates).

Note that this is a trivial mathematical fact, if the output depends deterministically on the inputs. Yet, it is a critical point in the debate. Consider: here is an elementary example in which all inputs are independent Poisson processes with the same constant firing rate, and the output is also a (quasi-) Poisson process with constant rate. But the fact that one input neuron spikes is informative about whether the output neuron will spike shortly after, conditionally to the knowledge of the rate of the first neuron. In other words, rates do not fully describe the (joint) activity of the network. This is a direct contradiction of the rate-based postulate.

Even though this means that the rate-based hypothesis is mathematically wrong (at least in this case), it may still be that it is a good enough approximation. If one input spike is known, one gets a little bit of extra information about whether the output neuron spikes, compared to the sole knowledge of the rates. Maybe this is a slight discrepancy. But consider: if all input spikes are known, one gets full information about the output spikes, since the process is deterministic and reliable. This is a very strong discrepancy with the rate-based hypothesis. One may ask the question: if I observe p input spikes occurring together, how much can I predict about output spiking? This is the question we tried to answer in Rossant et al. (2011), and it follows an argument proposed by Abeles in the 1980s. In a fluctuation-driven regime, if one observes just one input spike, chances are that the membrane potential is far from threshold, and the neuron is very unlikely to fire. But if, say, 10 spikes are observed, each producing a 1 mV depolarization and the threshold is about 10 mV about the mean potential, then there is 50% chance of observing an output spike. Abeles called the ratio between the extra firing produced by 10 independent spikes and by 10 coincident spikes the “coincidence advantage”, and it is a huge number. Consider again: if you only know the input rates, then there is a 5% chance of observing a spike in a 10 ms window, for an output neuron firing at 5 Hz; if you additionally know that 10 spikes have been fired, then there is a 50% chance of observing an output spike. This is a huge change, involving the observation of just 0.1% of all synapses (assuming 10,000 synapses).

Thus, it is difficult to argue here that rates entirely determine the activity of the network. Simply put, the fact that the input-output function of neurons is essentially deterministic introduces strong correlations between input and output spike trains. It is a simple fact, and it is well known in the theoretical literature about neural network dynamics. For example, one line of research, initiated mainly by Nicolas Brunel, tries to determine the firing rates (average and time-varying) of networks of spiking models, using a self-consistent analysis. It is notably difficult to do this in general in the fluctuation-driven regime, because of the correlations introduced by the spiking process. To solve it, the standard hypothesis is to consider sparse networks with a random connectivity. This ensures that there is no short cycle in the connectivity graph, and therefore that inputs to a given neuron are approximately independent. But the theoretical predictions break when this hypothesis is not satisfied. It is in fact a challenge in theoretical neuroscience to extend this type of analysis to networks with realistic connectivity – i.e., with short cycles and non-random connectivity.

It is interesting to note that the concept of the balanced or fluctuation-driven regime was proposed in the 1990s as a way to support rate-based theories. In fact, analysis shows that it is specifically in this regime, and not in the mean-driven regime, that 1) neurons are essentially deterministic, 2) neurons are highly sensitive to the relative timing of input spikes, 3) there is a strong coordination between input and output spikes. The rate-based postulate is not valid at all in this regime.

Rate vs. timing (VIII) A summary of arguments and evidence

I have identified the rate-based hypothesis as a methodological postulate, according to which neural activity can entirely be described by underlying rates, which are abstract variables. In general, individual spikes are then seen as instantiations of a random point process with the given rates. It can also be seen as a hypothesis of independence between the algorithmic and physical levels, in David Marr’s terminology. On the contrary, spike-based theories consider that algorithms are defined at the spike level.

What is the empirical evidence? I will start by showing that most arguments that have been used in this debate do not actually help us distinguish between the two alternatives.

Variability. Perhaps the most used argument against spike-based theories is the fact that spike trains in vivo are variable both temporally and over trials, and yet this might well be the least relevant argument. I addressed this point in detail in my third post, so I will only briefly summarize. Within one recording, inter-spike intervals (ISIs) are highly variable. This has been used as a sign that spike trains are instantiations of random point processes. But variability of ISIs is also a requirement of spike-based theories, because it increases the amount of information available in spike timing (mathematically, entropy in the ISI distribution is maximal for Poisson processes). More generally, temporal or spatial variability can never be a way to distinguish between random and deterministic schemes, because the entropy of a distribution reflects either the information content (if it is used for the code) or the amount of randomness (if it cannot be used). This brings us to the argument of variability across trials, that is, the lack of reproducibility of neural responses to the same stimulus. But this is a category error: these observations tell us that responses are stochastic, not that the activity can be fully described by rates. Therefore, it is an argument in the stochastic vs. deterministic debate, not in the rate vs. spike debate. In addition, it is a weak argument because only the stimulus is controlled. The state of the brain (e.g. due to attention, or any other aspect of the network dynamics) is not. In some cases, the sensory inputs themselves are not fully controlled (e.g. eye movements in awake animals). Therefore, the lack of reproducibility may represent either true stochasticity or simply reflect the uncertainty about uncontrolled variables, which may still be accessible to the nervous system. The lack of reproducibility itself is also contentious, at least in subcortical and primary cortical areas. But since I pointed out that this is not a very relevant argument anyway, I will not comment on this evidence (although I should note that strong reproducibility at spike level would be an argument against rate-based theories).

Chaos. Related to the variability arguments is the chaos argument (see my related post). It has been claimed that neural networks are chaotic. This is an interesting point, because it has been used to argue in favor of rate-based theories when really, it is an argument against them. What chaos implies is an absence of reproducibility of neural responses to a given stimulus. As I argued in the previous paragraph, by itself the argument has no value in the rate vs. spike debate. But if it is true that the lack of reproducibility is due to chaotic dynamics, then this goes against the rate-based hypothesis. Indeed, chaotic systems are deterministic, they cannot be described as random processes. In particular, the variables are not independent, and trajectories live in lower-dimensional spaces (attractors). I am not convinced that network dynamics are truly chaotic (although they might be), but if they are, then defenders of rate-based theories should rather be worried.

Selectivity curves. The concept of the selectivity curve or tuning curve has been used very much in research on sensory systems (e.g. the visual cortex). It has been found for example that many cells in the primary visual cortex fire more in response to a moving bar or grating with a specific orientation. This observation is often reported as the statement that these neurons code for orientation. Implicitly, this means that the firing rate of these neurons contains information about orientation, and that this is the information used by the rest of the system. However, this is not what these experiments tell us. They only tell us that the firing rate covaries with stimulus orientation, nothing more. This cannot be an argument for rate-based theories, because in spike-based theories, the firing rate also varies with stimuli (see my specific post). Indeed, to process stimuli with spikes requires producing spikes, and so stimulus-dependent variations in firing rate are a necessary correlate of spike-based computation. It is useful to interpret spike counts in terms of energy consumption, and with this notion in mind, what orientation selectivity curves tell us is not that the cells code for orientation, but rather they care about orientation (or about a specific orientation). This is still quite an informative statement, but it does not tell us anything about whether the firing rate is the right quantity to look at.

Fast processing. To be fair, I will now critically examine an argument that has been used to contradict rate-based theories. It has been shown with psychophysical experiments that complex visual tasks can be performed by humans in very little time, so little time that any neuron along the processing chain may only fire once or not at all. This observation contradicts any scheme based on counting spikes over time, but it does not contradict views based on rate as a firing probability or as a spatial average – however, it does impose constraints on these views. It also rules out schemes based on interspike intervals. In other words, it discards computing schemes based on information obtained within single neurons (interspike interval or spike count) rather than across neurons (relative timing or population firing rate).

High correlations. A number of studies claim that there are significant correlations between neural responses, in some cases. For example, neurons of the LGN that share a presynaptic retinal ganglion cell tend to fire synchronously, at a short timescale. This contradicts one of the claims of rate-based theories, that firing between neurons is independent, conditionally to the underlying rates. Other studies have shown oscillations that organize spiking at different timescales (e.g. in the visual cortex, and in the hippocampus). These observations may be seen as contradicting rate-based theories (especially the former), but it could be opposed that 1) these correlations may still not have a big impact on neural dynamics, and 2) even if they do, it is a minor modification to rate-based theory if they do not depend systematically on the stimulus. For example, opponents of oscillation-based theories would argue that oscillations are a by-product of the fact that networks are recurrent, and as feedback systems they can develop oscillations, which bear no functional significance. In the same way, fine scale correlations between neighboring LGN neurons may result from anatomical factors, but it may only contribute to amplify the thalamic input to the cortex – not a fundamental change in rate-based theory. But there are now a number of studies, in the cortex (e.g. from Singer’s lab) and in the hippocampus (e.g. from Buzsaki’s lab) that show a systematic relationship between functional aspects and oscillatory properties. Fine scale correlations have not been studied so extensively in relationship to stimulus properties, but recently there was a study showing that the correlation between two neighboring LGN neurons in response to oriented gratings is tuned to orientation (Stanley et al. 2012). These cells project to cortical neurons in V1, whose firing rate is tuned to orientation. Thus, there is pretty clear evidence that correlations can be stimulus-dependent. The main question, then, is whether these correlations actually make a difference. That is, does the firing rate of a neuron depend mainly on the underlying rates of the presynaptic neurons, or can fine scale correlations (or, say, a few individual spikes) make a difference? I will come back to this question in more detail below.

Low correlations. Before I discuss the impact of correlations on neural firing, I will also comment on the opposite line of arguments. A few recent studies have actually claimed that there are weak correlations between cortical neurons. First of all, the term “weak” is generally vague, i.e., weak compared to what? Is 0.1 a weak or a strong correlation? Such unqualified statements are subjective. One would intuitively think that 0.01 is a very weak correlation, in the sense that it is probably as if it were zero. But this is mere speculation. Another statement might be that correlations are not statistically significant. This statement is objective, but not conclusive. It only means that positive correlations could not be observed given the duration of the recordings, which amounts to saying that correlations are smaller than the maximum amount that can be measured. This is not more informative than saying that there is (say) an 0.1 correlation – it is even less informative, if this maximum amount is not stated. So is 0.1 a weak or a strong pairwise correlation? The answer is, in general, that it is a huge correlation. As argued in Rossant et al. (2011), correlations make a huge difference to postsynaptic firing unless they are negligible compared to 1/N, where N is the number of synapses of the cell. So for a typical cortical neuron, this would mean negligible compared to 0.0001. The argument is very simple: independent inputs contribute to the membrane potential variance as N, but correlated inputs as c.N², where c is the pairwise correlation. The question, in fact, is rather how to deal with such huge correlations (more on this below).

Below I will discuss a little more the impact of correlations on postsynaptic firing, but before that, I would first like to stress two important facts: 1) the presence of pairwise correlations is not critical to all spike-based theories, 2) in other spike-based theories, relative spike timing is stimulus-dependent and possibly transient. Indeed there are prominent spike-theories based on asynchrony rather than synchrony. For example, the rank-order theory (e.g. from Thorpe’s lab) proposes that information is encoded in the relative activation order of neurons, and there is no particular role for synchrony. The theory does not predict a high amount of correlations. However, this rank order information may still manifest itself in cross-correlograms, as a stimulus-dependent asymmetry. Another example is the predictive spike coding theory defended by Sophie Denève, in which neurons fire when a specific criterion is fulfilled, so as to minimize an error. This predicts that neurons fire asynchronously - in fact in a slightly anti-correlated way. Finally, even in those theories based on synchrony, such as the one I presented recently (Brette 2012), neurons are not correlated in general. In the theory I proposed, synchrony is an unlikely event, which is detected by neurons. It is precisely because it is unlikely that it is meaningful – in this case, it signals some structure that is unlikely to be observed by chance. I have to recognize, however, that when a structured stimulus is presented, specific neuron groups fire in synchrony, throughout the duration of the stimulus. I actually do not think that it should necessarily be the case (except for the binaural system). Pushing the theory further, I would argue that once the stimulus structure is established and recognized, it is not unlikely anymore, and therefore only the onset of the synchrony event is meaningful and required by the theory. Therefore, the prediction of the theory is rather that there are transient synchrony events, associated to specific properties of stimuli, which have an impact on target neurons. To summarize, spike-based theories do not generally predict strong correlations, and none of these theories predict correlations in spontaneous activity.

This post is already long, so I will finish with a brief discussion of the impact of correlations on postsynaptic firing – a longer one in the next post. As I mentioned above, very small pairwise correlations have a huge impact on postsynaptic firing. To be negligible, they should be small compared to 1/N, where N is the number of synapses of the postsynaptic neuron. Another way to look at it, which is discussed in detail in Rossant et al. (2011), is that changing the timing of a few spikes (on the order of 10 synapses, out of 10,000) has a dramatic effect on postsynaptic firing (i.e., from silent to strongly firing). This point was already made in the 1980s by Abeles. The phenomenon occurs specifically in the fluctuation-driven regime, so in the next post I will describe this regime and what it means for the debate.

Rate vs. timing (VII) Marr's levels of analysis

In summary, the debate of rate vs. timing is not about the description timescale, but about the notion that neural activity and computation may be entirely and consistently defined by the time-varying rates r(t) in the network. In fact, it is interesting to cast this debate in the analysis framework proposed by David Marr. I have discussed this framework in other posts, but it is worth explaining it here again. Marr proposed that information processing systems can be analyzed at three levels:

1) The computational level: what does the system do? (for example: estimating the location of a sound source)

2) The algorithmic/representational level: how does it do it? (for example: by calculating the maximum of cross-correlation between the two monaural signals)

3) The physical level: how is it physically realized? (for example: with axonal delay lines and coincidence detectors)

Rate-based theories postulate that algorithms and representations can be defined independently of spikes that instantiate them. Indeed, as I argued in previous posts, the instantaneous firing rate is not a physical quantity but an abstraction (in general a probability of firing), and it is postulated that all algorithms can be defined at the level of rates, without loss. The conversion between spikes and rates is seen as independent from that level. In other words, the rate-based hypothesis is the postulate that the algorithmic and the physical levels are independent. In contrast, spike-based theories consider that these levels are not independent, i.e., that algorithms are defined at the spike level.

In the example of sound localization I used in the description of three levels, the binaural neuron implements the cross-correlation between two monaural signals. This is possible if one assumes that monaural signals are transduced to spikes, through a Poisson process with rate equal to these signals, the binaural neuron responds to coincidences and the result is the spike count of the binaural neuron. This is rate-based theory (even though based on coincidence detection). Alternatively, in Goodman & Brette (2010), signals are transduced to spikes through an essentially deterministic process, and the binaural neuron spikes to signal the similarity between the transduced signals (note that a single spike is meaningful here, see my post on the difference between correlation and synchrony). This is spike-based theory. It also makes a functional difference in the example I just described, because in the spike-based version, the neuron is also sensitive to interaural intensity differences.

When expressed as the independence between the algorithmic and spike level, the rate-based hypothesis seems like an ad hoc postulate. Why would evolution make it such that it is possible to describe neural algorithms in terms of rates, what is the advantage from the organism’s point of view? This is why I see the rate-based hypothesis as a methodological postulate, rather than a true scientific hypothesis. That is, it is a postulate that makes it simpler for us, external observers, to describe and understand what neurons are doing. This is so because most of our calculus is based on operations on analog signals rather on discrete entities (spikes). It is then hoped that this level of description is adequate, but there is no strong biological reason why it should be so. It just seems adequate enough to defenders of rate-based theories, and they endorse it because it is methodologically convenient.

This reminds me of discussions I have had with strong advocates of rate-based theories, who are also reasonable scientists. When faced with evidence and arguments that strongly suggest that rates cannot fully describe neural activity, they may agree. But they remain unconvinced, because they do not see why they should abandon a seemingly working theory (rate-based calculus) for a hypothesis that does not help them understand the system, even though it is more empirically valid (neural computation is based on spikes, but how exactly?). In other words, why bother with the extra complication of spikes? This is what I mean by a “methodological postulate”: it is not that, for empirical reasons, neurons are more likely to discard any information about spike timing, but rather that it seems conceptually more convenient to think in terms of analog quantities rather than spikes.

This means that this debate will not be resolved by accumulating empirical evidence for or against either alternative. For defenders of spike-based theories, it can only be resolved by providing a convincing theory of spike-based computation that could replace rate-based calculus. For defenders of rate-based theories, the challenge is rather to find mechanisms by which neural activity can truly be reduced to calculus with analog signals – a difficult task, as I will show in the next posts.

What is computational neuroscience? (III) The different kinds of theories in computational neuroscience

Before I try to answer the questions I asked at the end of the previous post , I will first describe the different types of approaches in computational neuroscience. Note that this does not cover everything in theoretical and quantitative neuroscience (see my first post).

David Marr, a very important figure in computational neuroscience, proposed that cognitive systems can be described at three levels:

1) The computational level: what does the system do? (for example: estimating the sound location of a sound source)

2) The algorithmic/representational level: how does it do it? (for example: by calculating the maximum of cross-correlation between the two monaural signals)

3) The physical level: how is it physically realized? (for example: with axonal delay lines and coincidence detectors)

Theories in computational neuroscience differ by which level is addressed, and by the postulated relationships between the three levels (see also my related post).

David Marr considered that these three levels are independent. Francisco Varela described this view as “computational objectivism”. This means that the goal of the computation is defined in terms that are external to the organism. The two other levels describe how this goal is achieved, but they have no influence on what is achieved. It is implied that evolution shapes levels 2 and 3 by imposing the first level. It is important to realize that theories that follow this approach necessarily start from the highest level (defining the object of information processing), and only then analyze the lower levels. Such approaches can be restricted to the first level, or the first two levels, but they cannot address only the third level, or the second level, because these are defined by the higher levels. It can be described as a “top-down” approach.

The opposite view is that both the algorithmic and computational levels derive from the physical level, i.e., they emerge from the interactions between neurons. Varela described it as “neurophysiological subjectivism”. In this view, one would start by analyzing the third level, and then possibly go up to the higher levels – this is a “bottom-up” approach. This is the logic followed by data-driven approaches that I criticized in my first post. I criticized it because this view fails to acknowledge the fact that living beings are intensely teleonomic, i.e., the physical level serves a project (invariant reproduction, in the words of Jacques Monod). This is not to say that function is not produced by the interaction of neurons – it has to, in a materialistic view. But as a method of scientific inquiry, analyzing the physical level independently of the higher levels, as if it were a non-living object (e.g. a gas), does not seem adequate – at least it seems highly hopeful. As far as I know, this type of approach has produced theories of neural dynamics, rather than theories of neural computation. For example, showing how oscillations or some other large scale aspect of neural networks might emerge from the interaction of neurons. In other words, in Marr’s hierarchy, such studies are restricted to the third level. Therefore, I would categorize them as theoretical neuroscience rather than computational neuroscience.

These two opposite views roughly correspond to externalism and internalism in philosophy of perception. It is important to realize that these are important philosophical distinctions, which have considerable epistemological implications, in particular on what is considered a “realistic” model. Computational objectivists would insist that a biological model must serve a function, otherwise it is simply not about biology. Neurophysiological subjectivists would insist that the models must agree with certain physiological experiments, otherwise they are empirically wrong.

There is another class of approaches in philosophy of perception, which can be seen as intermediate between these two, the embodied approaches. These consider that the computational level cannot be defined independently of the physical level, because the goal of computation can only be defined in terms that are accessible to the organism. In the more external views (Gibson/O’Regan), this means that the computational level actually includes the body, but the neural implementation is seen as independent from the computational level. For example, in Gibson’s ecological approach and in O’Regan’s sensorimotor theory, the organism looks for information about the world implicit in its sensorimotor coupling. This differs quite substantially from computational objectivism in the way the goal of the computation is defined. In computational objectivism, the goal is defined externally. For example: to estimate the angle between a sound source and the head. Sensorimotor theories acknowledge that the notion of “angle” is one of an external observer with some measurement apparatus, it cannot be one of an organism. Instead in sensorimotor approaches, direction is defined subjectively (contrary to computational objectivism), but still in reference to an external world (contrary to neurophysiological subjectivism), as the self-generated movement that would make the sound move to the front (an arbitrary reference point). In the more internal views (e.g. Varela), the notion of computation itself is questioned, as it is considered that the goal is defined by the organism itself. This is Varela’s concept of autopoiesis, according to which a living entity acts so as to maintain its own organization. “Computation” is then a by-product of this process. This last class of approaches is currently less developed in computational neuroscience.

The three types of approaches I have described are mostly between the relationships between the computational and physical levels, and they are tightly linked with different views in philosophy of perception. There is also another divide line between neural computation theories, which has to do with the relationship between the algorithmic and physical levels. This is related to the rate-based vs. spike-based theories of neural computation (see my series of posts on the subject).

In Marr’s view and in general in rate-based views, the algorithmic and physical levels are mostly independent. Because algorithms are generally described in terms of calculus with analog values, spikes are generally seen as implementing analog calculus. In other words, spikes only reflect an underlying analog quantity, the firing rate of a neuron, on which the algorithms are defined. The usual view is that spikes are produced randomly with some probability reflecting the underlying rate (an abstract quantity).

On the contrary, another view holds that algorithms are defined at the level of spikes, not of rates. Such theories include the idea of binding by synchrony (Singer/von der Malsburg), in which neural synchrony is the signature of a coherent object, the related idea of synfire chains (Abeles), and more recently the theories developed by Sophie Denève and by myself (there is also Thorpe’s rank-order coding theory, but it is more on the side of coding than computation). In these former two theories, spiking is seen as a decision. In Denève’s approach, the neuron spikes so as to reduce an error criterion. In my recent paper on computing with synchrony, the neuron spikes when it observes unlikely coincidences, which signals some invariant structure (in the sense of Gibson). In both cases, the algorithm is defined directly at the level of spikes.

In summary: theories of neural computation can be classified according to the implicit relationships between the three levels of analysis described by Marr. It is important to realize that these are not purely scientific differences (by this, I mean not simply about empirical disputes), but really philosophical and/or epistemological differences. In my view this is a big issue for the peer-reviewing system, because it is difficult to have a paper accepted when the reviewers or editors do not share the same epistemological views.