What is computational neuroscience? (I) Definitions and the data-driven approach

What is computational neuroscience? Simply put, it is the field that is concerned with how the brain computes. The word “compute” is not necessarily an analogy with the computer, and it must be understood in a broad sense. It simply refers to the operations that must be carried out to perform cognitive functions (walking, recognizing a face, speaking). Put this way, it might seem that this is pretty much the entire field of neuroscience. What distinguishes computational neuroscience, then, is that this field seeks a mechanistic understanding of these operations, to the point that they could potentially be simulated on a computer. Note that this means neither that computational neuroscience is mostly about simulating the brain, nor that the brain is thought of as a computer. It simply refers to the materialistic assumption that, if all the laws that underlie cognition are known in details, then it should be possible to artificially reproduce them (assuming sufficient equipment).

Another related terminology is “theoretical neuroscience”. This is somewhat broader than computational neuroscience, and is probably an analogy to theoretical physics, a branch of physics that relies heavily on mathematical models. Theoretical neuroscience is not necessarily concerned with computation, at least not directly. One example could be the demonstration that action potential velocity is proportional to diameter in myelinated axons, and to the square root of the diameter in unmyelinated axons. This demonstration uses cable theory, a biophysical theory describing the propagation of electrical activity in axons and dendrites.

“Quantitative neuroscience” also refers to the use of quantitative mathematical models as a tool to understand brain function or dynamics, but the substitution of “quantitative” for “theoretical” suggests that the field is more concerned with data analysis (as opposed to theories of how the brain works).

Finally, “neural modeling” is concerned with the use of quantitative neural models, in general biophysical models. The terminology suggests a data-driven approach, i.e., building models of neural networks from experimental measurements, based on existing theories. This is why I am somewhat uneasy with this terminology, for epistemological reasons. The data-driven approach implicitly assumes that it is possible and meaningful to build a functioning neural network from a set of measurements alone. This raises two critical issues. One is that it is based on what Francisco Varela called “neurophysiological subjectivism” (see this related post), the idea that perception is the result of neural network dynamics. Neurophysiological subjectivism is problematic because (in particular) it fails to fully recognize the defining property of living beings, which is teleonomy (in other words, function). Living organisms are constrained on one hand by their physical substrate, but on the other hand this substrate is tightly constrained by evolution – this is precisely what makes them living beings and not just spin glasses. The data-driven approach only considers the constraints deriving from measurements, not the functional constraints, but this essentially amounts to denying the fact that the object of study is part of a living being. Alternatively, it assumes that measurements are sufficiently constraining that function is entirely implied, which seems naive.

The second major issue with the data-driven approach is that it has a strong flavor of inductivism. That is, it implicitly assumes that a functioning model is directly implied by a finite set of measurements. But inductivism is a philosophical error, for there are an infinite number of theories (or “models”) consistent with any finite set of observations (an error pointed out by Hume, for example). In fact, Popper and his followers also noted that inductivism commits another philosophical error, which is to think that there is such a thing as a “pure observation”. Experimental results are always to be interpreted in a specific theoretical context (a.k.a. the “Methods” section). One does not “measure” a model. One performs a specific experiment and observes the outcome with tools, which are themselves based on currently accepted theories. In other words, an experimental result is the answer to a specific question. But the type of question is not “What is the time constant of the model?”, but rather “What exponential function can I best fit to the electrical response of this neuron to a current pulse?”. Measurements may then provide constraints on possible models, but they never imply a model. In addition, as I noted above, physical constraints (implied by measurements) are only one side of the story, functional constraints are the other side. Neglecting this other side means studying a “soup of neurons”, not the brain.

In summary, it is often stated or implied that “realistic” models are those that are based on measurements: this is 1) an inductivist mistake, 2) a tragic disregard of what defines living beings, i.e. functional constraints.

I will end this post by asking a question: what is a better description of the brain? A soup of “realistic” neurons or a more conceptual mechanistic description of how interacting neurons support cognitive functions?

Rate vs. timing (V) Fast rate-based coding

Misconception #4: “A stochastic spike-based theory is nothing else than a rate-based theory, only at a finer timescale”.

It is sometimes claimed or implied that there is no conceptual difference between the two kinds of theories, the only difference being the timescale of the description (short timescale for spike-based theories, long timescale for rate-based theories). This is a more subtle misconception, which stems from a confusion between coding and computation. If one only considers the response of a neuron to a stimulus and how much information there is in that response about the stimulus, then yes, this statement makes sense.

But rate-based and spike-based theories are not simply theories of coding, they are also theories of computation, that is, of how responses of neurons depend on the responses of other neurons. The key assumption of rate-based theories is that it is possible and meaningful to reduce this transformation to a transformation between analog variables r(t), the underlying time-varying rates of the neurons. These are hidden variables, since only the spike trains are observable. The state of the network is then entirely defined by the set of time-varying rates. Therefore there are two underlying assumptions: 1) that the output spike train can be derived from its rate r(t) alone, 2) that a sufficiently accurate approximation of the presynaptic rates can be derived from the presynaptic spike trains, so that the output rate can be calculated.

Since spike trains are considered as stochastic with (expected) instantaneous rate r(t), assumption #1 means that spike trains are stochastic point processes defined from and consistent with the time-varying rate r(t) – they could be Poisson processes, but not necessarily. The key point here is that the spiking process is only based on the quantity r(t). This means in particular that the source of noise is independent between neurons.

The second assumption means that the operation performed on input spike trains is essentially independent of the specific realizations of the random processes. There are two possible cases. One alternative is that the law of large numbers can be applied, so that integrating inputs produces a deterministic value that depends on the presynaptic rates. But then the source of noise, which produces stochastic spike trains from a deterministic quantity, must be entirely intrinsic to the neuron. Given what we know from experiments in vitro (Mainen and Sejnowski, 1995), this is a fairly strong assumption. The other alternative is that the output rate depends on higher statistical orders of the total input (e.g. variance) and not only on the mean (e.g. through the central limit theorem). But in this case, the inputs must be independent, for otherwise it would not be possible to describe the output rate r(t) as a single quantity, since the transformation would also depend on higher-order quantities (correlations).

In other words, the assumptions of rate-based theories mean that spike trains are realizations of independent random processes, with a source of stochasticity entirely intrinsic to the neuron. This is a strong assumption that has little to do with the description timescale.

This assumption is also known to be inconsistent in general in spiking neural network theory. Indeed it is possible to derive self-consistent equations that describe the transformation between the input rates of independent spike trains and the output rate of an integrate-and-fire model (Brunel 2001), but these equations fail unless one postulates that connections between neurons are sparse and random. This postulate means that there are no short cycles in the connectivity graph, so that inputs to a neuron are effectively independent. Otherwise, the assumption of independent outputs is inconsistent with overlaps in inputs between neurons. Unfortunately, neural networks in the brain are known to be non-random and with short cycles (Song et al. 2005).

To be fair, it is still possible that neurons that share inputs have weakly correlated outputs, if inhibition precisely tracks excitation (Renart et al. 2010). But it should be stressed that it is the assumptions of rate-based theories that require a specific non-trivial mechanism, rather than those of spike-based theories. It is ironic that spike-based theories are sometimes depicted as exotic by tenants of rate-based theories, while the burden of proof should in fact reside on the latter.

To summarize this post: the debate of rate vs. timing is not about the description timescale, but about the notion that neural activity and computation may be entirely and consistently defined by the time-varying rates r(t) in the network. This boils down to whether neurons spike in a stochastic independent manner, conditionally to the input rates. It is worth noting that this is a very strong assumption, with currently very little evidence in favor, and a lot of evidence against.

Rate vs. timing (IV) Chaos

Misconception #3: “Neural codes can only be based on rates because neural networks are chaotic”. Whether this claim is true or not (and I will comment on it below), chaos does not imply that spike timing is irrelevant. To draw this conclusion is to commit the same category error as I discussed in the previous post, i.e., confusing rate vs. timing and stochastic vs. deterministic.

In a chaotic system, nearby trajectories quickly diverge. This means that it is not possible to predict the future state from the present state, because any uncertainty in estimating the present state will result in large changes in future state. For this reason, the state of the system at a distant time in the future can be seen as stochastic, even though the system itself is deterministic.

Specifically, in vitro experiments suggest that individual neurons are essentially deterministic devices (Mainen and Sejnowski 1995) – at least the variability seen in in vitro recordings is often orders of magnitude lower than in vivo. But a system composed of interacting neurons can be chaotic, and therefore for all practical aspects their state can be seen as random, so the chaos argument goes.

The fallacy of this argument can be seen by considering the prototypical chaotic system, climate. It is well known that the weather cannot be predicted more than 15 days in the future, because even tiny uncertainties in measurements make the climate models diverge very quickly. But this does not mean that all you can do is pick a random temperature according to the seasonal distribution. It is still possible to make short term predictions, for example. It also does not mean that climate dynamics can be meaningfully described only in terms of mean temperatures (and other mean parameters). For example, there are very strong correlations between weather events occurring at nearby geographical locations. Chaos implies that it is not possible to make accurate predictions in the distant future. It does not imply that temperatures are random.

In the same way, the notion that neural networks are chaotic only implies that one cannot predict the state of the network in the distant future. This has nothing to do with the distinction between rate and spike timing. Rate (as mean seasonal temperature) may still be inadequate to describe the dynamics of the system, and firing may still be correlated across neurons.

In fact the chaos argument is an argument against rate-based theories, precisely because a chaotic system is not a random system. In particular, in a chaotic system, there are lawful relationships between the different variables. Taking the example of climate again, the solutions of the Lorenz equations (a model of atmostpheric convection) live in a low-dimensional manifold with a butterfly shape known as the Lorenz attractor. Even though one cannot predict the values of the variables in the distant future, these variables evolve in a very coordinated way. It would be a mistake to replace them by their average values. Therefore, if it is true than neural networks are chaotic, then it is probably not true that their dynamics can be described in terms of rates only.

I will end this post by commenting on the notion that neural networks are chaotic. I very much doubt that chaos is an adequate concept to describe spiking dynamics. There are different definitions of a chaotic system, but essentially they state that a chaotic system is a system that is very sensitive to initial conditions, in the sense that two trajectories that are initially very close can be very far apart after a relatively short time. Now take a neuron and inject a constant current: it will fire regularly. In the second trial, inject the exact same current but 1 ms later. Initially the state of the neuron is almost identical in both trials. But when the neuron fires in the first trial, its membrane potential diverges very quickly from the trajectory of the second trial. Is this chaos? Of course not, because the trajectories meet again about 1 ms later. In fact, I showed in a study of spike time reliability in spiking models (Brette and Guigon, 2003) that even if the trajectories diverge between spikes (such as with the model dv/dt=v/tau), spike timing can still be reliable in the long run in response to fluctuating inputs. This counter-intuitive property can be seen as nonlinear entrainment.

In summary, 1) chaos does not support rate-based theories, it rather invalidates them, and 2) chaos is probably not a very meaningful concept to describe spiking dynamics.

Rate vs. timing (III) Another category error

Misconception #2: “Neural responses are variable in vivo, therefore neural codes can only be based on rates”. Again, this is a category error. Neural variability (assuming this means randomness) is about determinism vs. stochasticity, not about rate vs. timing. There can be stochastic or deterministic spike-based theories.

I will expand on this point, because it is central to many argumentations in favor of rate-based theories. There are two ways to understand the term "variable" and I will first discard the meaning based on temporal variability. Interspike intervals (ISIs) are highly variable in the cortex (Softky and Koch, 1993), and their distribution is close to an exponential (or Gamma) function, as for Poisson processes (possibly with a refractory period). This could be interpreted as the sign that spike trains are realizations of random point processes. This argument is very weak, because the exponential distribution is also the distribution with maximum entropy for a given average rate, which means that maximizing the information content in the timing of spikes of a single train also implies an exponential distribution of ISIs. Temporal variability cannot distinguish between rate-based and spike-based theories.

Therefore the only reasonable variability-based argument in support of the rate-based view is the variability of spike trains across trials. In the cortex (but not so much in some early sensory areas such as the retina and some parts of the auditory brainstem), both the timing and number of spikes produced by a neuron in response to a given stimulus varies from one trial to another. This means that the response of a neuron to a stimulus cannot be described by a deterministic function. In other words, the stimulus-output relationship of neurons is stochastic. This is the only fact that this observation tells us (note that we may also argue that stochasticity only reflects uncertainty on hidden variables). That this stochasticity is entirely captured by an intrinsic time-varying rate signal is pure speculation at this stage. Therefore, the argument of spike train variability is about stochastic vs. deterministic theories, not about rate-based vs. spike-based theories. It only discards deterministic spike-based theories based on absolute spike timing. However, the prevailing spike-based theories are based on relative timing across different neurons (for example synchrony or rank order), not on absolute timing.

In fact, the argument can be returned against rate-based theories. It is often written or implied that rate-based theories take into account biological variability, whereas spike-based theories do not. But actually, quite the opposite is true. Rate-based theories are fundamentally deterministic, and a deterministic description is obtained at the cost of averaging noisy responses over many neurons, or over a long integration time. On the other hand, spike-based theories take into account individual spikes, and therefore do not rely on averaging. In other words, it is not that rate-based descriptions account for more observed variability, it is just that they acknowledge that neural responses are noisy, but they do not account for any variability at all. Accounting for more variability would require stochastic spike-based accounts. This confusion may stem from the fact that spike-based theories are often described in deterministic terms. But as stressed above, rate-based theories are also described in deterministic terms.

Throwing dice can be described by deterministic laws of mechanics. The fact that the outcomes are variable does not invalidate the laws of mechanics. It simply means that noise (or chaos) is involved in the process. Therefore criticizing spike-based theories for not being stochastic is not a fair point, and stochasticity of neural responses cannot be a criterion to distinguish between rate-based and spike-based theories.

Rate vs. timing (II) Rate in spike-based theories

To complement the previous post, I will comment on what firing rate means in spike-based theories. First of all, rate is important in spike-based theories. The timing of a spike can only exist if there is a spike. Therefore, the firing rate determines the rate of information in spike-based theories, but it does not determine the content of information.

A related point is energy consumption. The energy consumption of a cell is essentially proportional to the number of spikes it produces (taking into account the cost of synaptic transmission to target neurons) (Attwell and Laughlin, 2001). It seems reasonable to think that the organism tries to avoid any waste of energy, therefore a cell that fires at high rate must be doing something important. In terms of information, it is likely that the amount of information transmitted by a neuron is roughly proportional, or at least correlates with its firing rate.

For these two observations, it follows that, in spike-based theories, firing rate is a necessary correlate of information processing in a neuron. This stands in contrast with rate-based theories, in which rate is the basis of information processing. But both types of theories predict that firing rates correlate with various aspects of stimuli – and therefore that there is information about stimuli in firing rates, at least for an external observer.

Rate vs. timing (I) A category error

This post starts a series on the debate between rate-based and spike-based theories of neural computation and coding. My primary goal is to clarify the concepts. I will start by addressing a few common misconceptions about the debate.

Misconception #1: “Both rate and spike timing are important for coding, so the truth is in between”. This statement, I will argue, is what philosophers would call a “category error”: it is not that only one of the alternatives can be right, it is just that the two alternatives belong to different categories.

Neurons mainly communicate with each other using trains of spikes – at least this is what the rate-timing debate is concerned about. A spike train can be completely characterized by the timing of its spikes. The firing rate, on the other hand, is an abstract definition, that is only valid in a limit, which involves an infinite number of spikes. For example, it can be defined for a single neuron as a temporal average: the inverse of the mean inter-spike interval. It appears that rate is defined from the timing of spikes. Thus these are two different concepts: spike timing is what defines spike trains, whereas rate is an abstract mathematical construction on spike trains. Therefore the rate vs. timing debate is not about which one is right, but about whether rate is a sufficiently good description of neural activity. Spike-based theories do not necessarily claim that rate does not matter, they refute the notion that rate is the essential quantity that matters.

There are different ways to define the firing rate: over time (number of spikes divided by the duration, in the limit of infinite duration), over neurons (average number of spikes in a population of neurons, in the limit of an infinite number of neurons) or over trials (average number of spikes over an infinite number of trials). In the third definition (which might be the prevailing view), the rate is seen as an intrinsic time-varying signal r(t) and spikes are seen as random events occurring at rate r(t). In all these definitions, rate is an abstract quantity defined on the spike trains. Therefore when stating that the neural “code” is based on rates rather than spike timing, what is meant is that the concept of rate captures most of the important details of neural activity and computation, while precise spike timing is essentially meaningless. On the other hand, when stating that spike timing matters, it is not meant that rate is meaningless; it simply means that precise timing information cannot be discarded. Thus, these are not two symmetrical views: the stronger assumptions are on the side of the rate-based view. Now of course each specific spike-based theory makes a number of possibly strong assumptions. But the general idea that the neural “code” is based on individual spikes and not just rates is not based on strong assumptions. The rate-based view is based on an approximation, which may be a good one or a bad one. This is the nature of the rate vs. timing debate.

On the role of voluntary action in perception

The sensorimotor theory of perception considers that to perceive is to understand the effect of active movements on sensory signals. Gibson’s ecological theory also places an emphasis on movements: information about the visual world is obtained by producing movements and registering how the visual field changes in lawful ways. Poincaré also defined the notion of space in terms of the movements required to reach an object or compensate for movements of an object.

Information about the world is contained in the sensorimotor “contingencies” or “invariants”, but why should it be important that actions are voluntary? Indeed, one could see movements as another kind of sensory information (e.g. proprioceptive information, or “efferent copy”), and a sensorimotor law is then just a law defined on the entire set of accessible signals. I will propose two answers below. I only address the computational problem (why it is useful), not the problem of consciousness.

Why would it make a difference that action is voluntary? The first answer I will give comes from ideas discussed in robotics and machine learning, and known as active learning, curiosity or optimal experiment design. Gibson makes this remark that the term “information” is misleading when talking about the sensory inputs. Senses cannot be seen as a communication channel, because the world does not send messages to be decoded by the organism. In fact rather the opposite is true: the organism actively seeks information about the world by making specific actions that improve its knowledge. A good analogy is the game “20 questions”. One participant thinks of an object or person. The other tries to discover it by asking questions that can only be answered by yes or no. She wins if she can guess the object with fewer than 20 questions. Clearly it is very difficult to guess using the answer to a random question. But by asking smart questions, one can quickly narrow the search to the right object. In fact with 20 questions, one can discover up to 220 = a million objects. Thus voluntary action is useful for efficiently exploring the world. Here by “voluntary” it is simply meant that action is a decision based on previous knowledge, which is intended to maximally increase future knowledge.

I can see another way in which voluntary action is useful, by drawing an analogy with philosophy of science. If perception is about inferring sensory or sensorimotor laws, then it raises an issue common to the development of science, which is how to infer universal laws from a finite set of observations. Indeed there are an infinite number of universal laws that are consistent with any finite set of observations – this is the problem of inductivism. Karl Popper argued that science progresses not by inferring laws, but by postulating falsifiable theories and testing them with critical experiments. Thus action can be seen as the test of a perceptual hypothesis. Perception without action is like science based on inductivism. Action can decide between several consistent hypotheses, and the fact that it is voluntary is what makes it possible to distinguish between causality and correlation (a fundamental problem raised by Hume). Here “voluntary” means that the action could have been different.

In summary, voluntary action can be understood as the test of a perceptual hypothesis, and it is useful both in establishing causal relationships and in efficiently exploring relevant hypotheses.

Marr’s levels of analysis and embodied approaches

Marr described the brain as an information-processing system, and argued it had to be understood at three distinct conceptual levels:

1) The computational level: what does the system do? (for example: estimating the location of a sound source)

2) The algorithmic/representational level: how does it do it? (for example: by calculating the maximum of cross-correlation between the two monaural signals)

3) The physical level: how is it physically realized? (for example: with axonal delay lines and coincidence detectors)

This is what Francisco Varela describes as “computational objectivism”. That is, the purpose of the computation is to extract information about the world, in an externally defined representation. For example, to extract the interaural time difference between the two monaural sounds. Varela describes the opposite view as “neurophysiological subjectivism”, according to which perception is a result of neural network dynamics. Neurophysiological subjectivism is problematic because it fails to fully recognize the defining property of living beings, which is teleonomy. Jacques Monod (who got the Nobel prize for his work in molecular biology) articulated this idea by explaining that living beings, by the mechanics of evolution, differ from non-living things (say, a mountain) by the fact that they have a fundamental teleonomic project, which is “invariant reproduction” (in Hasard et Nécessité). The achievement of this project relies on specific molecular mechanisms, but it would be a mistake to think that the achievement of the project is the consequence of these mechanisms. Rather, the existence of mechanisms consistent with the project is a consequence of evolutionary pressure selecting these mechanisms: the project defines the mechanisms rather than the other way round. This is a fundamental aspect of life that is downplayed in neurophysiological subjectivism.

Thus computational objectivism improves on neurophysiological subjectivism by acknowledging the teleonomic nature of living beings. However, a critical problem is that the goal (first level) is defined in terms that are external to the organism. In other words, a critical issue is whether the three levels are independent. For example, in sound localization, a typical engineering approach is to calculate the interaural time differences as a function of sound direction, then calculate these differences by cross-correlation and invert the mapping. This approach fails in practice because in fact, these binaural cues depend on the shape of the head (and other aspects), which varies across individuals. One would then have to specify a mapping that is specific to each individual, and it is not reasonable to think that this might be hard-coded in the brain. This simply means that the algorithmic level (#2) must in fact be defined in relationship with the embodiment, which is part of level #3. This is in line with Gibson’s ecological approach, in which information about the world is obtained by detecting sensory invariants, a notion that depends on the embodiment. Essentially, this is the idea of the “synchrony receptive field” that I developed in a recent general paper (Brette, PLoS CB 2012), and before that in the context of sound localization (Goodman and Brette, PLoS CB 2010).

However, this still leaves the computational level (#1) defined in external terms, although the algorithmic level (#2) is defined in more ecological terms (sound location rather than ITD). The sensorimotor approach (and related approaches) closes the loop by proposing that the computational goal is to predict the effect of movements on sensory inputs. This implies the development of an internal representation of space, but space is a consequence of this goal, rather than an ad hoc assumption about the external world.

Thus I propose a redefinition of the three levels of analysis of a perceptual system that is more in line with embodied approaches:

1) Computational level: to predict the sensory consequences of actions (sensorimotor approach) or to identify the laws that govern sensory and sensorimotor signals (ecological approach). Embodiment (previously in level 3) is taken into account in this definition.

2) Algorithmic/representational level: how to identify these laws or predict future sensory inputs? (the kernel in the kernel-envelope theory in robotics)

3) Neurophysiological level (previously physical level): how are these principles implemented by neurons?

Here I am also postulating that these three levels are largely independent, but the computational level is now defined in relationship with the embodiment. Note: I am not postulating independence as a hypothesis about perception, but rather as a methodological choice.

Update. In a later post about rate vs. timing, I refine this idea by noting that, in a spike-based theory, levels 2 and 3 are in fact not independent, since algorithms are defined at the spike level.

 


"The brain uses all available information"

In discussions of « neural coding » issues, I have often heard the idea that “the brain uses all available information”. This idea generally pops up in response to the observation that neural responses are complex and vary with stimuli in ways that are difficult to comprehend. In this variability there is information about stimuli, and as complex as the mapping from stimuli to neural responses may be, the brain might well be able to invert this mapping. I sympathize with the notion that neural heterogeneity is information rather than noise, but I believe that, phrased in this way, this idea reveals two important misconceptions.

First of all, there is often a confusion between sensitivity (responses vary along several stimulus dimensions) and information (you can recover these dimensions from the responses). I made this point in a specific paper two years ago (pdf). Neural responses are observed for a specific experimental protocol, which is always constrained in a limited set of stimuli. One can often recover stimulus dimensions from the responses within this set, but it is a mistake to conclude that the brain can do it, because this inverse mapping depends on the particular experimental set of stimuli. In other words, the mapping is in fact from the observed neural responses and the knowledge of the experimental protocol to the stimulus. The brain does not have access to such an external knowledge. Therefore, information is always highly overestimated in this type of analysis. This is in fact a classical problem in machine learning, related to the issues of training vs. test error, generalization and overfitting. The key concept is robustness: the hypothesized inverse mapping should be robust to large changes in the set of stimuli.

The second misconception is more philosophical, and has to do with the general investigation of “neural codes”. What is a code? It is a way of representing information. But sensory information is already present at the level of sensory inputs, and it is a theorem that information can only decrease along a processing chain. So if we say that the goal of a code is only to represent the maximum amount of information about stimuli, then what is gained by having a second (central) code, which can only be a degraded version of the initial sensory inputs? Thinking in this way is in fact committing the homunculus fallacy: looking at the neural responses as a projection of sensory inputs, which “the brain” observes. This projection achieves nothing, for it still leaves unexplained how the brain makes sense of sensory inputs – nothing has been gained in terms of what these inputs mean. At some point there needs to be something else than just representing sensory inputs in a high-dimensional space.

The answer, of course, is that the goal of a “neural code” is not just to represent information, but to do it in such a way that makes it easier to process relevant information. This is the answer provided by representational theories (e.g. David Marr). Then you might also argue that the very notion of a neural code is misleading because the role of a perceptual system is not to encode sensory inputs but to guide behavior, and therefore it is more appropriate to speak of computation rather than code. In either view, the relevant question when interpreting neural responses is not how the rest of brain can make use of it, but rather how they participate in solving the perceptual problem. I believe one key aspect is behavioral invariance, for example the fact that you can localize a sound source independently of its level (within a certain range). Another key aspect is that the “code” is in some way easier to decode for “neural observers” (not just any observer).