What is computational neuroscience? (IV) Should theories explain the data?

Since there is such an obvious answer, you might anticipate that I am going to question it! More precisely, I am going to analyze the following statement: a good theory is one that explains the maximum amount of empirical data while being as simple as possible. I will argue that 1) this is not stupid at all, but that 2) it cannot be a general criterion to distinguish good and bad theories, and finally that 3) it is only a relevant criterion for orthodox theories, i.e., theories that are consistent with theories that produced the data. The arguments are not particularly original, I will mostly summarize points made by a number of philosophers.

First of all, given a finite set of observations, there are an infinite number of universal laws that agree with the observations, so the problem is undetermined. This is the skeptic criticism of inductivism. Which theory to choose then? One approach is "Occam's razor", i.e., the idea that among competing hypotheses, the most parsimonious one should be preferred. But of course, Karl Popper and others would argue that it cannot be a valid criterion to distinguish between theories, because it could still be that the more complex hypothesis predicts future observations better than the simpler hypothesis - there is just no way to know without doing the new experiments. Yet it is not absurd as a heuristic to develop theories. This is a known fact in the field of machine learning for example, related to the problem of "overfitting". If one wants to describe the relationship between two quantities x and y, from a set of n examples (xi,yi), one could perfectly fit an nth-order polynomial to the data. It would completely explain the data, but yet would be very unlikely to fit a new example. In fact, a lower-dimensional relationship is more likely to account for new data, and this can be shown more rigorously with the tools of statistical learning theory. Thus there is a trade-off between how much of the data is accounted for and the simplicity of the theory. So, Occam's Razor is actually a very sensible heuristic to produce theories. But it should not be confused with a general criterion to discard theories.

The interim conclusion is: a theory should account for the data, but not at the expense of being as complicated as the data itself. Now I will make criticisms that are deeper, and mostly based on post-Popper philosophers such as Kuhn, Lakatos and Feyerabend. In a nutshell, the argument is that insisting that a theory should explain empirical data is a kind of inversion of what science is about. Science is about understanding the real world, by making theories and testing them with carefully designed experiments. These experiments are usually done using conditions that are very unecological, and this is justified by the fact that they are designed to test a specific hypothesis in a controlled way. For example, the laws of mechanics would be tested in conditions where there is no friction, a condition that actually almost never happens in the real world - and this is absolutely fine methodology. But then insisting that a new theory should be evaluated by how much it explains the empirical data is what I would call the "empiricist inversion": empirical data were produced, using very peculiar conditions justified by the theory that motivated the experiments, and now we demand that any theory should explain this data. One obvious point, which was made by Kuhn and Feyerabend, is that it gives a highly unfair advantage to the first theory, just because it was there first. But it is actually worse than this, because it also means that the criterion to judge theories is now disconnected from what was meant to be explained in the first place by the theory that produced the data. Here is the empiricist inversion: we consider that theories should explain data, when actually data is produced to test theories. What a theory is meant to explain is the world; data is only used as a methodological tool to test theories of the world.

In summary, this criterion then tends to produce theories of data, not theories of the world. This point in fact relates to the arguments of Gibson, who criticized psychological research for focusing on laboratory stimuli rather than ecological conditions. Of course simplified laboratory stimuli are used to control experiments precisely, but it should always be kept in mind that these simplified stimuli are used as methodological tools and not as the things that are meant to be explained. In neural modeling, I find that many models are developed to explain experimental data, ignoring the function of the models (i.e., the “computational level” in Marr’s analysis framework). In my view, this is characteristic of the empiricist inversion, which results in models of the data, not models of the brain.

At this point, my remarks might start being confusing. On one hand I am saying that it is a good idea to try to account for the data with a simple explanation, on the other hand I am saying that we should not care so much about the data. These seemingly contradictory statements can still make sense because they apply to different types of theories. This is related to what Thomas Kuhn termed “normal science” and “revolutionary science”. These terms might sound a bit too judgmental so I will rather speak of “orthodox theories” and “non-orthodox theories”. The idea is that science is structured by paradigm shifts. Between such shifts, a central paradigm dominates. Data are obtained through this paradigm, anomalies are also explained through this paradigm (rather than being seen as falsifications), and a lot of new scientific results are produced by “puzzle solving”, i.e., trying to explain data. At some point, for various reasons (e.g. too many unexplained anomalies), the central paradigm shifts to a new one and the process starts again, but with new data, new methods, or new ways to look at the observations.

“Orthodox theories” are theories developed within the central paradigm. These try to explain the data obtained with this paradigm, the “puzzle-solving” activity. Here it makes sense to consider that a good theory is a simple explanation of the empirical data. But this kind of criterion cannot explain paradigm shifts. A paradigm shift requires the development of non-orthodox theories, for which the existing empirical data may not be adequate. Therefore the making of non-orthodox theories follows a different logic. Because the existing data were obtained with a different paradigm, these theories are not driven by the data, although they may be motivated by some anomalous set of data. For example they may be developed from philosophical considerations or by analogy. The logic of their construction might be better described by counter-induction rather than induction (a concept proposed by Feyerabend). That is, their development starts from a theoretical principle, rather than from data, and existing data are deconstructed so as to fit the theory. By this process, implicit assumptions of the central paradigm are uncovered, and this might ultimately trigger new experiments and produce new experimental data that may be favorable to the new theory.

Recently, there have been a lot of discussions in the fields of neuroscience and computational neuroscience about the availability of massive amounts of data. Many consider it as a great opportunity, which should change the way we work and build models. It certainly seems like a good thing to have more data, but I would like to point out that it mostly matters for the development of orthodox theories. Putting too much emphasis (and resources) on it also raises the danger of driving the field away from non-orthodox theories, which in the end are the ones that bring scientific revolutions (with the caveat that of course most non-orthodox theories turn out to be wrong). Being myself unhappy with current orthodox theories in neuroscience, I see this danger as quite significant.

This was a long post and I will now try to summarize. I started with the provocative question: should a theory explain the data? First of all, a theory that explains every single bit of data is an enumeration of data, not a theory. It is unlikely to predict any new significant fact. This point is related to overfitting or the “curse of dimensionality” in statistical learning. A better theory is one that explains a lot of the data with a simple explanation, a principle known as Occam’s razor. However, this criterion should be thought of as a heuristic to develop theories, not a clear-cut general decision criterion between theories. In fact, this criterion is relevant mostly for orthodox theories, i.e., those theories that follow the central paradigm with which most data have been obtained. Non-orthodox theories, on the other hand, cannot be expected to explain most of the data obtained through a different paradigm (at least initially). It can be seen that in fact they are developed through a counter-inductive process, by which data are made consistent with the theory. This process may fail to produce new empirical facts consistent with the new theory (most often) or it may succeed and subsequently become the new central paradigm - but this is usually a long process.

The intelligence of slime molds

Slime molds are fascinating: these are unicellular organisms that can display complex behaviors such as finding the shortest path in a maze and developing an efficient transportation network. Actually each of these two findings generated a high-impact publication (Science and Nature) and an Ignobel prize. In the latter study, the authors grew a slime mold on a map of Japan, with food on the biggest cities, and demonstrated that it developed a transportation network that looked very much like the railway network of Japan (check out the video!).

More recently, there was a recent PNAS paper in which the authors showed that a slime mold can solve the “U-shaped trap problem”. This is a classic spatial navigation problem in robotics: the organism is behind a U-shaped barrier and there is food behind it. It cannot navigate to the food using local rules (e.g. following a path along which the distance to the food continuously decreases), and therefore it requires some form of spatial memory. This is not a trivial task for robots, but the slime mold can do it (check out the video).

What I find particularly interesting is that the slime mold has no brain (it is a single cell!), and yet it displays behavior that requires some form of spatial memory. The way it manages to do the task is that it leaves extracellular slime behind it and uses it to mark the locations it has already visited. It can then explore its environment by avoiding extracellular slime, and it can go around the U-shaped barrier. Thus it uses an externalized memory. This is a concrete example that shows that (neural) representation is not always necessary for complex cognition. It nicely illustrates Rodney Brook’s famous quote: “The world is its own best model”. That is, why develop a complex map of the external world when you can directly interact with it?

Of course, we humans don’t usually leave slime on the floor to help us navigate. But this example should make us think about the nature of spatial memory. We tend to think of spatial memory in terms of maps, in analogy with actual maps that we can draw on a paper. However, it is now possible to imagine other ways in which a spatial memory could work, in analogy with the slime mold. For example, one might imagine a memory system that leaves “virtual slime” in places that have been already explored, that is, that associates environmental cues about location with a “slime signal”. This would confer the same navigational abilities as those of slime molds, without a map-like representation of the world. For the organism, having markers in the hippocampus (the brain area involved in spatial memory) or outside the skull might not make a big difference (does the mind stop at the boundary of the skull?).

It is known that in mammals, there are cells in the hippocampus that fire at a specific (preferred) location. These are called “place cells”. How about if the meaning of spikes fired by these place cells were that there is “slime” in their favorite place? Of course I realize that this is a provocative question, which might not go so well with other known facts about the hippocampus, such as grid cells (cells that fire when the animal is at nodes of a regular spatial grid). But it makes the point that maps, in the usual sense, may not be the only way in which these experimental observations can be interpreted. That is, the neural basis of spatial memory could be thought of as operational (neurons fire to trigger some behavior) rather than representational (the world is reconstructed from spike trains).

Rate vs. timing (X) Rate theories in spiking network models

According to the rate-based hypothesis, 1) neural activity can be entirely described by the dynamics of underlying firing rates and 2) firing is independent between neurons, conditionally to these rates. This hypothesis can be investigated in models of spiking neural networks by a self-consistency strategy. If all inputs to a neuron are independent Poisson processes, then the output firing rate can be calculated as a function of input rates. Rates in the network are then solutions of a fixed point equation. This has been investigated in random networks in particular by Nicolas Brunel. In a statistically homogeneous network, theory gives the stationary firing rate, which can be compared to numerical simulations. The approach has also been applied to calculate self-sustained oscillations (time-varying firing rates) in such networks. In general, theory works nicely for sparse random networks, in which a pair of neurons is connected with a low probability. Sparseness implies that there are no short cycles in the connectivity graph, so that the fact that the inputs to a neuron and its output are strongly dependent has little impact on the dynamics. Results of simulations diverge from theory when the connection probability increases. This means that the rate-based hypothesis is not true in general. On the contrary, it relies on specific hypotheses.

Real neural networks do not look like random sparse networks, for example they can be strongly connected locally, neurons can be bidirectionally connected or form clusters. Recently, there have been a number of nice theoretical papers on densely connected balanced networks (Renart et al., Science 2010; Litwin-Kumar and Doiron, Nat Neurosci 2012), which a number of people have interpreted as supporting rate-based theories. In such networks, when inhibition precisely counteracts excitation, excitatory correlations (due to shared inputs) are cancelled by the coordination between inhibition and excitation. As a result, there are very weak pairwise correlations between neurons. I hope it is now clear from my previous posts that this is not an argument in favor of rate-based theories. The fact that correlations are small says nothing about whether dynamics can be faithfully described by underlying time-varying rates.

In fact, in such networks, neurons are in a fluctuation-driven regime, meaning that they are highly sensitive to coincidences. What inhibition does is to cancel the correlations due to shared inputs, i.e., the meaningless correlations. But this is precisely what one would want the network to do in spike-based schemes based on stimulus-specific synchrony (detecting coincidences that are unlikely to occur by chance) or on predictive coding (firing when there is a discrepancy between input and prediction). In summary, these studies do not support the idea that rates are an adequate basis for describing network dynamics. They show how it is possible to cancel expected correlations, a useful mechanism in both rate-based and spike-based theories.

Update. These observations highlight the difference between correlation and synchrony. Correlations are meant as temporal averages, for example pairwise cross-correlation. But on a timescale relevant to behavior, temporal averages are irrelevant. What might be relevant are spatial averages. Thus, synchrony is generally meant as the fact that a number of neurons fire at the same time, or a number of spikes arrive at the same time at a postsynaptic neuron. This is a transient event, which may not be repeated. A single event is meaningful if such synchrony (possibly involving many neurons) is unlikely to occur by chance. The terms “by chance” refer to what could be expected given the past history of spiking events. This is precisely what coordinated inhibition may correspond to in the scheme described above: the predicted level of input correlations. In this sense, inhibition can be tuned to cancel the expected correlations, but by definition it cannot cancel coincidences that are not expected. Thus, the effect of such an excitation-inhibition coordination is precisely to enhance the salience of unexpected synchrony.

Rate vs. timing (IX) The fluctuation-driven regime and the Softky-Shadlen debate

In the 1990s, there was a famous published exchange about the rate vs. timing debate, between Softky and Koch on one side, and Shadlen and Newsome on the other side. Softky and Koch argued that if spike trains were random, as they seemed to be in single unit recordings, and cortical neurons sum many inputs, then by the law of large numbers their output should be regular, since the total input would be approximately constant. Therefore, so they argued, there is an inconsistency in the two hypotheses (independence of inputs and integration). They proposed to resolve it by postulating that neurons do not sum their inputs but rather detect coincidences at a millisecond timescale, using dendritic nonlinearities. Shadlen and Newsome demonstrated that the two hypotheses are in fact not contradictory, if one postulates that the total mean input is subthreshold, so that spikes only occur when the total input fluctuates above its average. This is called the “fluctuation-driven regime”, and it is a fairly well accepted hypothesis nowadays. When there are many inputs, this can happen when excitation is balanced by inhibition, hence the other standard name “balanced regime” (note that balanced implies fluctuation-driven, but not the other way round). An electrophysiological signature of this regime is a distribution of membrane potential that peaks well below threshold (instead of monotonically increasing towards threshold).

In the fluctuation-driven regime, output spikes occur irregularly, because the neuron only spikes when there is a fluctuation of the summed input. Thus the two hypotheses are not contradictory: it is completely possible that a neuron receives independent Poisson inputs, integrates them, and fires in a quasi-Poisson way. This argument indeed makes the submillisecond coincidence detection hypothesis unnecessary. However, Softky then correctly argued that even then, output spikes are still determined by input spikes, so they cannot be seen as random. To be more precise: input spike trains are independent Poisson processes, the output spike train is (approximately) a Poisson process, but inputs and outputs are not independent. In their reply, Shadlen and Newsome miss this argument. They show that if they replay the same pattern of spikes to the neuron that led to a spike, but with a different history of inputs, then the neuron may not spike. This happened in their model for two reasons: 1) they used a variation of the perfect integrator, a very particular kind of model that is known to be unreliable, contrary to almost every other spiking neuron model, and to actual neurons (Brette & Guigon 2003), 2) they considered a pattern of input spikes restricted to a window much shorter than the integration time constant of the neuron. If they had played a pattern covering one integration time window to a standard integrate-and-fire model (or any other model), then they would have seen output spikes. But perhaps more importantly, even if the input pattern is restricted either temporally or to a subset of synapses, the probability that the neuron fires is much higher than chance. In other words, the output spike train is not independent of any of the input spike trains. This would appear in a cross-correlogram between any input and the output, as an extra firing probability at positive lags, on the timescale of the integration time constant, with a correlation of order 1/N (since there is 1 output spike for N input spikes, assuming identical rates).

Note that this is a trivial mathematical fact, if the output depends deterministically on the inputs. Yet, it is a critical point in the debate. Consider: here is an elementary example in which all inputs are independent Poisson processes with the same constant firing rate, and the output is also a (quasi-) Poisson process with constant rate. But the fact that one input neuron spikes is informative about whether the output neuron will spike shortly after, conditionally to the knowledge of the rate of the first neuron. In other words, rates do not fully describe the (joint) activity of the network. This is a direct contradiction of the rate-based postulate.

Even though this means that the rate-based hypothesis is mathematically wrong (at least in this case), it may still be that it is a good enough approximation. If one input spike is known, one gets a little bit of extra information about whether the output neuron spikes, compared to the sole knowledge of the rates. Maybe this is a slight discrepancy. But consider: if all input spikes are known, one gets full information about the output spikes, since the process is deterministic and reliable. This is a very strong discrepancy with the rate-based hypothesis. One may ask the question: if I observe p input spikes occurring together, how much can I predict about output spiking? This is the question we tried to answer in Rossant et al. (2011), and it follows an argument proposed by Abeles in the 1980s. In a fluctuation-driven regime, if one observes just one input spike, chances are that the membrane potential is far from threshold, and the neuron is very unlikely to fire. But if, say, 10 spikes are observed, each producing a 1 mV depolarization and the threshold is about 10 mV about the mean potential, then there is 50% chance of observing an output spike. Abeles called the ratio between the extra firing produced by 10 independent spikes and by 10 coincident spikes the “coincidence advantage”, and it is a huge number. Consider again: if you only know the input rates, then there is a 5% chance of observing a spike in a 10 ms window, for an output neuron firing at 5 Hz; if you additionally know that 10 spikes have been fired, then there is a 50% chance of observing an output spike. This is a huge change, involving the observation of just 0.1% of all synapses (assuming 10,000 synapses).

Thus, it is difficult to argue here that rates entirely determine the activity of the network. Simply put, the fact that the input-output function of neurons is essentially deterministic introduces strong correlations between input and output spike trains. It is a simple fact, and it is well known in the theoretical literature about neural network dynamics. For example, one line of research, initiated mainly by Nicolas Brunel, tries to determine the firing rates (average and time-varying) of networks of spiking models, using a self-consistent analysis. It is notably difficult to do this in general in the fluctuation-driven regime, because of the correlations introduced by the spiking process. To solve it, the standard hypothesis is to consider sparse networks with a random connectivity. This ensures that there is no short cycle in the connectivity graph, and therefore that inputs to a given neuron are approximately independent. But the theoretical predictions break when this hypothesis is not satisfied. It is in fact a challenge in theoretical neuroscience to extend this type of analysis to networks with realistic connectivity – i.e., with short cycles and non-random connectivity.

It is interesting to note that the concept of the balanced or fluctuation-driven regime was proposed in the 1990s as a way to support rate-based theories. In fact, analysis shows that it is specifically in this regime, and not in the mean-driven regime, that 1) neurons are essentially deterministic, 2) neurons are highly sensitive to the relative timing of input spikes, 3) there is a strong coordination between input and output spikes. The rate-based postulate is not valid at all in this regime.

Rate vs. timing (VIII) A summary of arguments and evidence

I have identified the rate-based hypothesis as a methodological postulate, according to which neural activity can entirely be described by underlying rates, which are abstract variables. In general, individual spikes are then seen as instantiations of a random point process with the given rates. It can also be seen as a hypothesis of independence between the algorithmic and physical levels, in David Marr’s terminology. On the contrary, spike-based theories consider that algorithms are defined at the spike level.

What is the empirical evidence? I will start by showing that most arguments that have been used in this debate do not actually help us distinguish between the two alternatives.

Variability. Perhaps the most used argument against spike-based theories is the fact that spike trains in vivo are variable both temporally and over trials, and yet this might well be the least relevant argument. I addressed this point in detail in my third post, so I will only briefly summarize. Within one recording, inter-spike intervals (ISIs) are highly variable. This has been used as a sign that spike trains are instantiations of random point processes. But variability of ISIs is also a requirement of spike-based theories, because it increases the amount of information available in spike timing (mathematically, entropy in the ISI distribution is maximal for Poisson processes). More generally, temporal or spatial variability can never be a way to distinguish between random and deterministic schemes, because the entropy of a distribution reflects either the information content (if it is used for the code) or the amount of randomness (if it cannot be used). This brings us to the argument of variability across trials, that is, the lack of reproducibility of neural responses to the same stimulus. But this is a category error: these observations tell us that responses are stochastic, not that the activity can be fully described by rates. Therefore, it is an argument in the stochastic vs. deterministic debate, not in the rate vs. spike debate. In addition, it is a weak argument because only the stimulus is controlled. The state of the brain (e.g. due to attention, or any other aspect of the network dynamics) is not. In some cases, the sensory inputs themselves are not fully controlled (e.g. eye movements in awake animals). Therefore, the lack of reproducibility may represent either true stochasticity or simply reflect the uncertainty about uncontrolled variables, which may still be accessible to the nervous system. The lack of reproducibility itself is also contentious, at least in subcortical and primary cortical areas. But since I pointed out that this is not a very relevant argument anyway, I will not comment on this evidence (although I should note that strong reproducibility at spike level would be an argument against rate-based theories).

Chaos. Related to the variability arguments is the chaos argument (see my related post). It has been claimed that neural networks are chaotic. This is an interesting point, because it has been used to argue in favor of rate-based theories when really, it is an argument against them. What chaos implies is an absence of reproducibility of neural responses to a given stimulus. As I argued in the previous paragraph, by itself the argument has no value in the rate vs. spike debate. But if it is true that the lack of reproducibility is due to chaotic dynamics, then this goes against the rate-based hypothesis. Indeed, chaotic systems are deterministic, they cannot be described as random processes. In particular, the variables are not independent, and trajectories live in lower-dimensional spaces (attractors). I am not convinced that network dynamics are truly chaotic (although they might be), but if they are, then defenders of rate-based theories should rather be worried.

Selectivity curves. The concept of the selectivity curve or tuning curve has been used very much in research on sensory systems (e.g. the visual cortex). It has been found for example that many cells in the primary visual cortex fire more in response to a moving bar or grating with a specific orientation. This observation is often reported as the statement that these neurons code for orientation. Implicitly, this means that the firing rate of these neurons contains information about orientation, and that this is the information used by the rest of the system. However, this is not what these experiments tell us. They only tell us that the firing rate covaries with stimulus orientation, nothing more. This cannot be an argument for rate-based theories, because in spike-based theories, the firing rate also varies with stimuli (see my specific post). Indeed, to process stimuli with spikes requires producing spikes, and so stimulus-dependent variations in firing rate are a necessary correlate of spike-based computation. It is useful to interpret spike counts in terms of energy consumption, and with this notion in mind, what orientation selectivity curves tell us is not that the cells code for orientation, but rather they care about orientation (or about a specific orientation). This is still quite an informative statement, but it does not tell us anything about whether the firing rate is the right quantity to look at.

Fast processing. To be fair, I will now critically examine an argument that has been used to contradict rate-based theories. It has been shown with psychophysical experiments that complex visual tasks can be performed by humans in very little time, so little time that any neuron along the processing chain may only fire once or not at all. This observation contradicts any scheme based on counting spikes over time, but it does not contradict views based on rate as a firing probability or as a spatial average – however, it does impose constraints on these views. It also rules out schemes based on interspike intervals. In other words, it discards computing schemes based on information obtained within single neurons (interspike interval or spike count) rather than across neurons (relative timing or population firing rate).

High correlations. A number of studies claim that there are significant correlations between neural responses, in some cases. For example, neurons of the LGN that share a presynaptic retinal ganglion cell tend to fire synchronously, at a short timescale. This contradicts one of the claims of rate-based theories, that firing between neurons is independent, conditionally to the underlying rates. Other studies have shown oscillations that organize spiking at different timescales (e.g. in the visual cortex, and in the hippocampus). These observations may be seen as contradicting rate-based theories (especially the former), but it could be opposed that 1) these correlations may still not have a big impact on neural dynamics, and 2) even if they do, it is a minor modification to rate-based theory if they do not depend systematically on the stimulus. For example, opponents of oscillation-based theories would argue that oscillations are a by-product of the fact that networks are recurrent, and as feedback systems they can develop oscillations, which bear no functional significance. In the same way, fine scale correlations between neighboring LGN neurons may result from anatomical factors, but it may only contribute to amplify the thalamic input to the cortex – not a fundamental change in rate-based theory. But there are now a number of studies, in the cortex (e.g. from Singer’s lab) and in the hippocampus (e.g. from Buzsaki’s lab) that show a systematic relationship between functional aspects and oscillatory properties. Fine scale correlations have not been studied so extensively in relationship to stimulus properties, but recently there was a study showing that the correlation between two neighboring LGN neurons in response to oriented gratings is tuned to orientation (Stanley et al. 2012). These cells project to cortical neurons in V1, whose firing rate is tuned to orientation. Thus, there is pretty clear evidence that correlations can be stimulus-dependent. The main question, then, is whether these correlations actually make a difference. That is, does the firing rate of a neuron depend mainly on the underlying rates of the presynaptic neurons, or can fine scale correlations (or, say, a few individual spikes) make a difference? I will come back to this question in more detail below.

Low correlations. Before I discuss the impact of correlations on neural firing, I will also comment on the opposite line of arguments. A few recent studies have actually claimed that there are weak correlations between cortical neurons. First of all, the term “weak” is generally vague, i.e., weak compared to what? Is 0.1 a weak or a strong correlation? Such unqualified statements are subjective. One would intuitively think that 0.01 is a very weak correlation, in the sense that it is probably as if it were zero. But this is mere speculation. Another statement might be that correlations are not statistically significant. This statement is objective, but not conclusive. It only means that positive correlations could not be observed given the duration of the recordings, which amounts to saying that correlations are smaller than the maximum amount that can be measured. This is not more informative than saying that there is (say) an 0.1 correlation – it is even less informative, if this maximum amount is not stated. So is 0.1 a weak or a strong pairwise correlation? The answer is, in general, that it is a huge correlation. As argued in Rossant et al. (2011), correlations make a huge difference to postsynaptic firing unless they are negligible compared to 1/N, where N is the number of synapses of the cell. So for a typical cortical neuron, this would mean negligible compared to 0.0001. The argument is very simple: independent inputs contribute to the membrane potential variance as N, but correlated inputs as c.N², where c is the pairwise correlation. The question, in fact, is rather how to deal with such huge correlations (more on this below).

Below I will discuss a little more the impact of correlations on postsynaptic firing, but before that, I would first like to stress two important facts: 1) the presence of pairwise correlations is not critical to all spike-based theories, 2) in other spike-based theories, relative spike timing is stimulus-dependent and possibly transient. Indeed there are prominent spike-theories based on asynchrony rather than synchrony. For example, the rank-order theory (e.g. from Thorpe’s lab) proposes that information is encoded in the relative activation order of neurons, and there is no particular role for synchrony. The theory does not predict a high amount of correlations. However, this rank order information may still manifest itself in cross-correlograms, as a stimulus-dependent asymmetry. Another example is the predictive spike coding theory defended by Sophie Denève, in which neurons fire when a specific criterion is fulfilled, so as to minimize an error. This predicts that neurons fire asynchronously - in fact in a slightly anti-correlated way. Finally, even in those theories based on synchrony, such as the one I presented recently (Brette 2012), neurons are not correlated in general. In the theory I proposed, synchrony is an unlikely event, which is detected by neurons. It is precisely because it is unlikely that it is meaningful – in this case, it signals some structure that is unlikely to be observed by chance. I have to recognize, however, that when a structured stimulus is presented, specific neuron groups fire in synchrony, throughout the duration of the stimulus. I actually do not think that it should necessarily be the case (except for the binaural system). Pushing the theory further, I would argue that once the stimulus structure is established and recognized, it is not unlikely anymore, and therefore only the onset of the synchrony event is meaningful and required by the theory. Therefore, the prediction of the theory is rather that there are transient synchrony events, associated to specific properties of stimuli, which have an impact on target neurons. To summarize, spike-based theories do not generally predict strong correlations, and none of these theories predict correlations in spontaneous activity.

This post is already long, so I will finish with a brief discussion of the impact of correlations on postsynaptic firing – a longer one in the next post. As I mentioned above, very small pairwise correlations have a huge impact on postsynaptic firing. To be negligible, they should be small compared to 1/N, where N is the number of synapses of the postsynaptic neuron. Another way to look at it, which is discussed in detail in Rossant et al. (2011), is that changing the timing of a few spikes (on the order of 10 synapses, out of 10,000) has a dramatic effect on postsynaptic firing (i.e., from silent to strongly firing). This point was already made in the 1980s by Abeles. The phenomenon occurs specifically in the fluctuation-driven regime, so in the next post I will describe this regime and what it means for the debate.

Rate vs. timing (VII) Marr's levels of analysis

In summary, the debate of rate vs. timing is not about the description timescale, but about the notion that neural activity and computation may be entirely and consistently defined by the time-varying rates r(t) in the network. In fact, it is interesting to cast this debate in the analysis framework proposed by David Marr. I have discussed this framework in other posts, but it is worth explaining it here again. Marr proposed that information processing systems can be analyzed at three levels:

1) The computational level: what does the system do? (for example: estimating the location of a sound source)

2) The algorithmic/representational level: how does it do it? (for example: by calculating the maximum of cross-correlation between the two monaural signals)

3) The physical level: how is it physically realized? (for example: with axonal delay lines and coincidence detectors)

Rate-based theories postulate that algorithms and representations can be defined independently of spikes that instantiate them. Indeed, as I argued in previous posts, the instantaneous firing rate is not a physical quantity but an abstraction (in general a probability of firing), and it is postulated that all algorithms can be defined at the level of rates, without loss. The conversion between spikes and rates is seen as independent from that level. In other words, the rate-based hypothesis is the postulate that the algorithmic and the physical levels are independent. In contrast, spike-based theories consider that these levels are not independent, i.e., that algorithms are defined at the spike level.

In the example of sound localization I used in the description of three levels, the binaural neuron implements the cross-correlation between two monaural signals. This is possible if one assumes that monaural signals are transduced to spikes, through a Poisson process with rate equal to these signals, the binaural neuron responds to coincidences and the result is the spike count of the binaural neuron. This is rate-based theory (even though based on coincidence detection). Alternatively, in Goodman & Brette (2010), signals are transduced to spikes through an essentially deterministic process, and the binaural neuron spikes to signal the similarity between the transduced signals (note that a single spike is meaningful here, see my post on the difference between correlation and synchrony). This is spike-based theory. It also makes a functional difference in the example I just described, because in the spike-based version, the neuron is also sensitive to interaural intensity differences.

When expressed as the independence between the algorithmic and spike level, the rate-based hypothesis seems like an ad hoc postulate. Why would evolution make it such that it is possible to describe neural algorithms in terms of rates, what is the advantage from the organism’s point of view? This is why I see the rate-based hypothesis as a methodological postulate, rather than a true scientific hypothesis. That is, it is a postulate that makes it simpler for us, external observers, to describe and understand what neurons are doing. This is so because most of our calculus is based on operations on analog signals rather on discrete entities (spikes). It is then hoped that this level of description is adequate, but there is no strong biological reason why it should be so. It just seems adequate enough to defenders of rate-based theories, and they endorse it because it is methodologically convenient.

This reminds me of discussions I have had with strong advocates of rate-based theories, who are also reasonable scientists. When faced with evidence and arguments that strongly suggest that rates cannot fully describe neural activity, they may agree. But they remain unconvinced, because they do not see why they should abandon a seemingly working theory (rate-based calculus) for a hypothesis that does not help them understand the system, even though it is more empirically valid (neural computation is based on spikes, but how exactly?). In other words, why bother with the extra complication of spikes? This is what I mean by a “methodological postulate”: it is not that, for empirical reasons, neurons are more likely to discard any information about spike timing, but rather that it seems conceptually more convenient to think in terms of analog quantities rather than spikes.

This means that this debate will not be resolved by accumulating empirical evidence for or against either alternative. For defenders of spike-based theories, it can only be resolved by providing a convincing theory of spike-based computation that could replace rate-based calculus. For defenders of rate-based theories, the challenge is rather to find mechanisms by which neural activity can truly be reduced to calculus with analog signals – a difficult task, as I will show in the next posts.

What is computational neuroscience? (III) The different kinds of theories in computational neuroscience

Before I try to answer the questions I asked at the end of the previous post , I will first describe the different types of approaches in computational neuroscience. Note that this does not cover everything in theoretical and quantitative neuroscience (see my first post).

David Marr, a very important figure in computational neuroscience, proposed that cognitive systems can be described at three levels:

1) The computational level: what does the system do? (for example: estimating the sound location of a sound source)

2) The algorithmic/representational level: how does it do it? (for example: by calculating the maximum of cross-correlation between the two monaural signals)

3) The physical level: how is it physically realized? (for example: with axonal delay lines and coincidence detectors)

Theories in computational neuroscience differ by which level is addressed, and by the postulated relationships between the three levels (see also my related post).

David Marr considered that these three levels are independent. Francisco Varela described this view as “computational objectivism”. This means that the goal of the computation is defined in terms that are external to the organism. The two other levels describe how this goal is achieved, but they have no influence on what is achieved. It is implied that evolution shapes levels 2 and 3 by imposing the first level. It is important to realize that theories that follow this approach necessarily start from the highest level (defining the object of information processing), and only then analyze the lower levels. Such approaches can be restricted to the first level, or the first two levels, but they cannot address only the third level, or the second level, because these are defined by the higher levels. It can be described as a “top-down” approach.

The opposite view is that both the algorithmic and computational levels derive from the physical level, i.e., they emerge from the interactions between neurons. Varela described it as “neurophysiological subjectivism”. In this view, one would start by analyzing the third level, and then possibly go up to the higher levels – this is a “bottom-up” approach. This is the logic followed by data-driven approaches that I criticized in my first post. I criticized it because this view fails to acknowledge the fact that living beings are intensely teleonomic, i.e., the physical level serves a project (invariant reproduction, in the words of Jacques Monod). This is not to say that function is not produced by the interaction of neurons – it has to, in a materialistic view. But as a method of scientific inquiry, analyzing the physical level independently of the higher levels, as if it were a non-living object (e.g. a gas), does not seem adequate – at least it seems highly hopeful. As far as I know, this type of approach has produced theories of neural dynamics, rather than theories of neural computation. For example, showing how oscillations or some other large scale aspect of neural networks might emerge from the interaction of neurons. In other words, in Marr’s hierarchy, such studies are restricted to the third level. Therefore, I would categorize them as theoretical neuroscience rather than computational neuroscience.

These two opposite views roughly correspond to externalism and internalism in philosophy of perception. It is important to realize that these are important philosophical distinctions, which have considerable epistemological implications, in particular on what is considered a “realistic” model. Computational objectivists would insist that a biological model must serve a function, otherwise it is simply not about biology. Neurophysiological subjectivists would insist that the models must agree with certain physiological experiments, otherwise they are empirically wrong.

There is another class of approaches in philosophy of perception, which can be seen as intermediate between these two, the embodied approaches. These consider that the computational level cannot be defined independently of the physical level, because the goal of computation can only be defined in terms that are accessible to the organism. In the more external views (Gibson/O’Regan), this means that the computational level actually includes the body, but the neural implementation is seen as independent from the computational level. For example, in Gibson’s ecological approach and in O’Regan’s sensorimotor theory, the organism looks for information about the world implicit in its sensorimotor coupling. This differs quite substantially from computational objectivism in the way the goal of the computation is defined. In computational objectivism, the goal is defined externally. For example: to estimate the angle between a sound source and the head. Sensorimotor theories acknowledge that the notion of “angle” is one of an external observer with some measurement apparatus, it cannot be one of an organism. Instead in sensorimotor approaches, direction is defined subjectively (contrary to computational objectivism), but still in reference to an external world (contrary to neurophysiological subjectivism), as the self-generated movement that would make the sound move to the front (an arbitrary reference point). In the more internal views (e.g. Varela), the notion of computation itself is questioned, as it is considered that the goal is defined by the organism itself. This is Varela’s concept of autopoiesis, according to which a living entity acts so as to maintain its own organization. “Computation” is then a by-product of this process. This last class of approaches is currently less developed in computational neuroscience.

The three types of approaches I have described are mostly between the relationships between the computational and physical levels, and they are tightly linked with different views in philosophy of perception. There is also another divide line between neural computation theories, which has to do with the relationship between the algorithmic and physical levels. This is related to the rate-based vs. spike-based theories of neural computation (see my series of posts on the subject).

In Marr’s view and in general in rate-based views, the algorithmic and physical levels are mostly independent. Because algorithms are generally described in terms of calculus with analog values, spikes are generally seen as implementing analog calculus. In other words, spikes only reflect an underlying analog quantity, the firing rate of a neuron, on which the algorithms are defined. The usual view is that spikes are produced randomly with some probability reflecting the underlying rate (an abstract quantity).

On the contrary, another view holds that algorithms are defined at the level of spikes, not of rates. Such theories include the idea of binding by synchrony (Singer/von der Malsburg), in which neural synchrony is the signature of a coherent object, the related idea of synfire chains (Abeles), and more recently the theories developed by Sophie Denève and by myself (there is also Thorpe’s rank-order coding theory, but it is more on the side of coding than computation). In these former two theories, spiking is seen as a decision. In Denève’s approach, the neuron spikes so as to reduce an error criterion. In my recent paper on computing with synchrony, the neuron spikes when it observes unlikely coincidences, which signals some invariant structure (in the sense of Gibson). In both cases, the algorithm is defined directly at the level of spikes.

In summary: theories of neural computation can be classified according to the implicit relationships between the three levels of analysis described by Marr. It is important to realize that these are not purely scientific differences (by this, I mean not simply about empirical disputes), but really philosophical and/or epistemological differences. In my view this is a big issue for the peer-reviewing system, because it is difficult to have a paper accepted when the reviewers or editors do not share the same epistemological views.

Affordances, subsumption, evolution and consciousness

James Gibson defended the idea that what we perceive of the environment is affordances, that is, the possibilities of interactions they allow. For example, a knob affords twisting, or the ground affords support. The concept of affordance makes a lot of sense, but Gibson also insisted that we directly perceive these affordances. It has never been very clear to me what he meant by that. But following recent discussions, I have thought of a way in which this statement might make sense - although I have no idea whether this is what Gibson meant.

The way sensory systems work is traditionally defined as an early extraction of “features”, like edges, which are then combined through a hierarchical architecture into more and more complex things, until one gets “an object”. In this view, affordances are obtained at the end of the chain, and so it is not direct at all. In robotics, another kind of architecture was proposed by Rodney Brooks in the 1980s, the “subsumption architecture”. It was meant as a way to make robots in an incremental way, by progressively adding layers of complexities. In his example, the first layer of the robot would be a simple control system by which external motor commands produce movement, and there is a sonar that computes a repulsive force in case there is a wall in front of it, and the force is sent to the motor module. Then there is a second layer that makes the robot wander. It basically randomly chooses a direction at regular intervals, and it combines it with the force computed by the sonar in the first layer. The second layer is said to “subsume” the first one, i.e., it takes over. Then there is another level on top of it. The idea in this architecture is that the set of levels below any level is functional, it can do something on its own. This is quite different from standard hierarchical sensory systems, in which the only purpose of each level is to send information to the next level. Here we get to Gibson’s affordances: if the most elementary level must be functional, then what it senses is not simple features, but rather simple affordances, simple ways to interact with its environment. So in this view, what is elementary in perception is affordances, rather than elementary sensations.

I think it makes a lot of sense from an evolutionary point of view that sensory systems should look like subsumption architectures rather than standard hierarchical perception systems. If each new structure (say, the cortex) is added on top of an existing set of structures, then the old set of structures has a function by itself, independently of the new structure. Somehow the old set is “subsumed” by the new structure, and the information this new structure gets must then already have a functional meaning. This would mean that affordances are the basis, and not the end of result, of the sensory system. In this sense, perhaps, one might say that affordances are “directly” perceived.

When thinking about what it means for consciousness, I like to refer to the man and horse analogy. The horse is perfectly functional by itself. It can run, it can see, etc. Now the man on top of it can “subsume” the horse. He sends commands to it so as to move it where he wants, and also gets signals from the horse. The man is conscious, but he has no idea of what the horse feels, for example how the horse feels the ground. All the sensations that underlie the man-horse’s ability to move around are inaccessible to the conscious man, but it is not a problem at all for the man to go where he wants to.

Now imagine that the man is blind. If there is an obstacle in front of the horse, the horse might stop, perhaps get nervous, things that the man can feel. The man cannot feel the wall in terms of “raw sensations”, but he can perceive that there is something that blocks the way. In other words, he can perceive the affordance of the wall – something that affords blocking, without seeing the wall.

So in this sense, it does not seem crazy anymore that what we directly perceive (we = our conscious self) is made of affordances rather than raw sensations.

What is computational neuroscience? (II) What is theory good for?

To answer this question, I need to write about basic notions of epistemology (the philosophy of knowledge). Epistemology is concerned in particular with what knowledge is and how it is acquired.

What is knowledge? Essentially, knowledge is statements about the world. There are two types of statements. First there are specific statements or “observations”, for example, “James has two legs”. But “All men have two legs” is a universal statement: it applies to an infinite number of observations, about men I have seen but also about men I might see in the future. We also call universal statements “theories”.

How is knowledge acquired? The naive view, classical inductivism, consists in collecting a large number of observations and generalizing from them. For example, one notes that all men he has seen so far have two legs, and concludes that all men have two legs. Unfortunately, inductivism cannot produce universal statements with certainty. It is well possible that one day you might see a man with only one leg. The problem is there are always an infinite number of universal statements that are consistent with any finite set of observations. For example, you can continue a sequence of finite numbers with any numbers you want, and it will still give you a possible a sequence of numbers.

Therefore, inductivism cannot guide the development of knowledge. Karl Popper, probably the most influential philosopher of science of the twentieth century, proposed to solve this problem with the notion of falsifiability. What distinguishes a scientific statement from a metaphysical statement is that it can be disproved by an experiment. For example, “all men have two legs” is a scientific statement, because the theory could be disproved by observing a man with one leg. But “there is a God” is not a scientific statement. This is not to say that these statements are true or not true, but that they have a scientific nature or not (but note that, by definition, a metaphysical statement can have no predictable impact on any of our experience, otherwise this would produce a test of that statement).

Popper’s concept of falsifiability has had a huge influence on modern science, and it essentially determines what we call “experimental work” and “theoretical work”. In Popper’s view, an experiment is an empirical test designed to falsify a theory. More generally, it is a situation for which different theories predict different outcomes. Note how this concept is different from the naive idea of “observing the laws of nature”. Laws of nature cannot be “observed” because an experiment is a single observation, whereas a law is a universal statement. Therefore, from a logical standpoint, the role of an experiment is rather to distinguish between otherwise consistent theories.

The structure of a typical experimental paper follows this logic: 1) Introduction, in which the theoretical issues are presented (the different hypotheses about some specific subject), 2) Methods, in which the experiment is described in details, so as to be reproducible, 3) Results, in which the outcomes are presented, 4) Discussion, in which the outcomes are shown to corroborate or invalidate various theories. Thus, an experimental paper is about formulating and performing a critical test of one, or usually several, theories.

Popper’s line of thinking seems to imply that knowledge can only progress through experimental work. Indeed theories can either be logically consistent or inconsistent, so there is no way to distinguish between logically consistent theories. Only empirical tests can corroborate or invalidate theories, and therefore produce knowledge. Hence the occasional demeaning comments that any theoretician has heard, around the idea that theories are mind games for a bunch of smart math-oriented people. That is, theory is useless since only empirical work can produce scientific knowledge.

This is a really paradoxical remark, for theory is the goal of scientific progress. Science is not about accumulating data, it is about finding the laws of nature, a.k.a. theories. It is precisely the predictive nature of science that makes it useful. How can it be that science is about making theories, but that science can only progress through empirical work?

Maybe this is a misunderstanding of Popper’s reasoning. Falsifiability is about how to distinguish between theories. It clarifies what empirical work is about, and what distinguishes science from metaphysics. But it says nothing about how theories are formulated in the first place. Falsifiability is about empirical validation of theories, not about the mysterious process of making theories, which we might say is the “hard problem” of philosophy of science. Yet making theories is a central part of the development of science. Without theory, there is simply no experiment to be done. But more importantly, science is made of theories.

So I can now answer the question I started with. Theories constitute the core of any science. Theoretical work is about the development of theories. Experimental work is about the testing of theories. Accordingly, theoretical papers are organized quite differently from experimental papers, because the methodology is very different, but also because there is no normalized methodology (“how it should be”). A number of computational journals insist on enforcing the structure of experimental papers (introduction / methods / results / discussion), but I believe this is due to the view that simulations are experiments (Winsberg, Philosophy of Science 2001), which I will discuss in another post.

Theory is often depicted as speculative. This is quite right. Theory is, in essence, speculative, since it is about making universal statements. But this does not mean that theory is nonsense. Theories are usually developed so as to be consistent with a body of experimental data, i.e., they have an empirical basis. Biological theories also often include a teleonomic element, i.e., they “make sense”. These two elements impose hard constraints on theories. In fact, they are so constraining that I do not know of any theory that is consistent with all (or even most) experimental data and that makes sense in a plausible ecological context. So theory making is about finding principled ways to explain existing data, and at the same time to explain biological function. Because this is such a difficult task, theoretical work can have some autonomy, in the sense that it can produce knowledge in the absence of new empirical work.

This last point is worth stressing, because it departs significantly from the standard Popperian view of scientific progress, which makes it a source of misunderstandings between theoreticians and experimenters. I am referring to the complexity of biological organisms, shaped by millions of years of evolution. Biological organisms are made of physical things that we understand at some level (molecules), but at the same time they serve a project (the global project being reproductive invariance, in the words of Jacques Monod). That they serve a project is not the simple result of the interaction of these physical elements, rather they are the result of evolutionary pressure. This means that even though on one hand we understand physics, or biophysics, to a high degree of sophistication, and on the other hand there are well established theories of biological function, there still is a huge explanatory gap between the two. This gap is largely theoretical, in the sense that we are looking for a way to make these two aspects logically consistent. This is why I believe theoretical work is so important in biology. It also has two consequences that can be hard to digest for experimenters: 1) theory can be autonomous to some extent (i.e., there can be “good” and “bad” theories, independently of new empirical evidence), 2) theoretical work is not necessarily aimed at making experimental predictions.

This discussion raises many questions that I will try to answer in the next posts:

- Why are theoretical and experimental journals separate?

- Should theories make predictions?

- Should theories be consistent with data?

- What is a “biologically plausible” model? And by the way, what is a model?

- Is simulation a kind of experiment?

Rate vs. timing (VI) Synaptic unreliability

How much intrinsic noise is there in a neuron? This question would deserve a longer post, but here I will just make a few remarks. In vitro, when the membrane potential is recorded in current-clamp, little noise is seen. There could be hidden noise in the spike generating process (i.e., in the sodium channels), but when a time-varying current in injected somatically into a cortical neuron, the spike trains are also highly reproducible (Mainen & Sejnowski, 1995). This means that the main source of intrinsic noise in vivo is synaptic unreliability.

Transmission at a given synapse is unreliable, in general. That is, there is a high probability of transmission failure, in which there is a presynaptic spike but no postsynaptic potential. However, an axon generally contacts a postsynaptic neuron at multiple release sites, which we may consider independent. If there are N sites with a transmission probability p, then the variance of the noise represents a fraction x=(1-p)/(pN) of the variance of the signal (expected PSP size). We can pick some numbers from Branco & Staras (2009). There seems to be quite different numbers depending on studies, but it gives an order of magnitude. For cat and rat L2/3 pyramidal cells, we have for example N=4 and p=0.5 (ref. 148). This gives x=0.25. Another reference (ref. 149) gives x=0.07 for the same cells.

These numbers are not that big. But it is possible that transmission probability is lower in vivo. So we have to recognize that synaptic noise might be substantial. However, even if it is true, it is an argument in favor of the stochasticity of neural computation, not in favor of rate-based computation. In addition, I would like to add that synaptic unreliability has little impact on theories based on synchrony and coincidence detection. Indeed, a volley of synchronous presynaptic spikes arriving at a postsynaptic neuron has an essentially deterministic effect, by law of large numbers. That is, synchronous input spikes are equivalent to multiple release sites. If there are m synchronous spikes, then the variance of the noise represents a fraction x=(1-p)/(pmN) of the variance of the signal (compound PSP). Taking the same numbers as above, if there are 10 synchronous spikes then we get x=0.025 (ref. 148) and x=0.007 (ref. 149), i.e., an essentially deterministic compound PSP. And we have shown that neurons are very sensitive to fast depolarizations in a background of noise (Rossant et al. 2011). The theory of synfire chains is also about the propagation of synchronous activity in a background of noise, i.e., taking into account synaptic unreliability.

In summary, the main source of intrinsic noise in neurons is synaptic noise. Experimental figures from the literature indicate that it is not extremely large but possibly substantial. However, as I noted in previous posts, the presence of large intrinsic noise does invalidate spiked-based theories but deterministic theories. In addition, synaptic noise has no impact on synchronous events, and therefore it is essentially irrelevant for synchrony-based theories.