What is sound? (VII) The phenomenology of pitch

So far, I have focused on an ecological description of sound, that is, how sounds appear from the perspective of an organism in its environment: the structure of sound waves captured by the ears in relationship with the sound-producing object, and the structure of interaction with sounds. There is nothing psychological per se in this description. It only specifies what is available to our perception, in a way that does not presuppose knowledge about the world. I now want to describe subjective experience of sounds in the same way, without preconceptions about what it might be. Such a preconception could be, for example, to say: pitch is the perceptual correlate of the periodicity of a sound wave. I am not saying that this is wrong, but I want to describe the experience of pitch as it appears subjectively to us, independently of what we may think it relates to.

This is in fact the approach of phenomenology. Phenomenology is a branch of philosophy that describes how things are given to consciousness, our subjective experience. It was introduced by Edmund Husserl and developed by a number of philosophers, including Merleau-Ponty and Sartre. The method of “phenomenological reduction” consists in suspending all beliefs we may have on the nature of things, to describe only how they appear to consciousness.

Here I will briefly discuss the phenomenology of pitch, which is the percept associated to how high or low a musical note is. A vowel also produces a similar experience. First of all, a pure tone feels like a constant sound, unlike a tone modulated at low frequency (say, a few Hz). This simple remark is already quite surprising. A pure tone is not a constant acoustical wave at all, it oscillates at a fast rate. Yet we feel it as a constant sound, as if nothing were changing at all in the sound. At the same time, we are not insensitive to this rate of change of the acoustical wave: if we vary the frequency of the pure tone, it feels very differently. This feeling is what is commonly associated to pitch: when the frequency is increased, the tone feels “higher”, when it is decreased, it feels “lower”. Interestingly, the language we use to describe pitch is that of space. I am too influenced by my own language and my musical background to tell whether we actually feel high sounds as being physically high, but it is an interesting observation. But for sure, low pitch sounds tend to feel larger than high pitch sounds, again a spatial dimension.

A very distinct property of pitch is that changing the frequency of the tone, i.e., the temporal structure of the sound wave, does not produce a perceptual change along a temporal dimension. Pitch is not temporal in the sense of: there is one thing, and then there is another thing. With a pure tone, there always seems to be a single thing, not a succession of things. In contrast, with an amplitude-modulated tone, one can feel that the sound is sequentially (but continuously) louder and weaker. In the same way, if one hits a piano key, the loudness of the sound decreases. In both cases there is a distinct feel of time associated to the change in amplitude of the sound wave. And this feel does not exist with the fast amplitude change of a tone. This simple observation demonstrates that phenomenological time is distinct from physical time.

Another very salient point is that when the loudness of the sound of the piano key decreases, the pitch does not seem to change. Somehow the pitch seems to be invariant to this change. I would qualify this statement, however, because this might not be true at low levels.

When a tone is accelerated, the sound seems to go higher, as when one asks a question. When it is decelerated, it seems to go lower, as when one ends a sentence. Here there is a feeling of time (first it is lower, then it is higher), corresponding to the temporal structure of the frequency change at a fast timescale.

Now when one compares two different sounds from the same instrument in sequence, there is usually a distinct feeling of one sound being higher than the other one. However, when the two sounds are very close in pitch, for example when one tunes a guitar, it can be difficult to tell which one is higher, even though it may be clearer that they have distinct pitches. When one plays two notes of different instruments, it is generally easy to tell whether it is the same note, but not always which one is higher. In fact the confusion is related to the octave similarity: if two notes are played on a piano, differing by an octave (which corresponds to doubling the frequency), they sound very similar. If they are played together instead of sequentially, they seem to fuse, almost as a single note. It follows that pitch seems to have a somewhat circular or helicoidal topology: there is an ordering from low to high, but at the same time pitches of notes differing by an octave feel very similar.

If one plays a melody on one instrument and then the same melody on another instrument, they feel like the same melody, even the though the acoustic waves are very different, and certainly they sound different. If one plays a piano key, then it is generally easy to immediately sing the same note. Of course when we say “the same note”, it is actually a very different acoustical wave that is produced by our voice, but yet it feels like it is the same level of “highness”. These observations certainly support the theory that pitch is the perceptual correlate of the periodicity of the sound wave, with the qualification that low repetition rates (e.g. 1 Hz) actually produce a feel of temporal structure (change in loudness or repeated sounds, depending on what is repeated in the acoustical wave) rather than a lower pitch.

The last observation is intriguing. We can repeat the pitch of a piano key with our voice, and yet most of us do not possess absolute pitch, the ability to name the piano key, even with musical training. It is intriguing because the muscular commands to the vocal system required to produce a given note are absolute, in the sense that they do not depend on musical context. This means, for most of us who do not possess absolute pitch, that these commands are not available to our consciousness as such. We can sing a note that we just heard, but we cannot sing a C. This suggests that we actually possess absolute pitch at a subconscious level.

I will come back to this point. Before, we need to discuss relative pitch. What is meant by “relative pitch”? Essentially, it is the observation that two melodies played in different keys sound the same. This is not a trivial fact at all. Playing a melody in a different key means scaling the frequency of all notes by the same factor, or equivalently, playing the fine structure of the melody at a different rate. The resulting sound wave is not at all like the original sound wave, either in the temporal domain (at any given time the acoustical pressures are completely different) or in the frequency domain (spectra could be non-overlapping). The melody sounds the same when fundamental frequencies are multiplied by the same factor, not when they are shifted by the same quantity. Note also that the melody is still recognizable when the duration of notes or gaps is changed, when the tempo is different, when expressivity is changed (e.g. loudness of notes) or when the melody is played staccato. This fact questions neurophysiological explanations based on adaptation.

Thus, it seems that, at a conscious level, what is perceived is primarily musical intervals. But even this description is probably not entirely right. It suggests that the pitch of a note is compared to the previous one to make sense. But if one hears the national hymn with a note removed, it will not feel like a different melody, but like the same melody with an ellipse. It is thus more accurate to say that a note makes sense within a harmonic context, rather than with respect to the previous note.

This point is in fact familiar to musicians. If a song is played and then one asks to sing another song, then the singer will tend to start the melody in the same key as the previous song. The two songs are unrelated, so thinking in terms of intervals does not make sense. But somehow there seems to be a harmonic context in which notes are interpreted.

Now the fact that there is such an effect of the previous song means that the harmonic context is maintained in working memory. It does not seem to require any conscious effort or attention, as when one tries to remember a phone number. Somehow it stays there, unconsciously, and determines the way in which future sounds are experienced. It does not even appear clearly whether there is a harmonic context in memory or if it has been “forgotten”.

Melodies can also be remembered for a long time. A striking observation is that it is impossible for most people to recall a known melody in the right key, the key in which it was originally played, and it is also impossible to tell whether the melody, played by someone else, is played in the right key. Somehow the original key is not memorized. Thus it seems that it is not the fundamental frequency of notes that is memorized. One could imagine that intervals are memorized rather than notes, but as I noted earlier, this is probably not right either. More plausible is the notion that it is the pitch of notes relative to the harmonic structure that is stored (i.e., pitch is relative to the key, not to the previous note).

We arrive at the notion that both the perception and the memory of pitch is relative, and it seems to be relative in a harmonic sense, i.e., relative to the key and not in the sense of intervals of successive notes. Now what I find very puzzling is that the fact that we can even sing means that, at a subconscious level but not at a conscious level, we must have a notion of absolute pitch.

Another intriguing point is that we can imagine a note, play it in our head, and then try to play it on a piano: it may sound like the note we played, or it may sound too high or too low. We are thus able to make a comparison between a note that is physically played and a note that we consciously imagine. But we are apparently not conscious of pitch in an absolute sense, in a way that relates directly to properties of physical sounds. The only way I can see to resolve this apparent contradiction is to say that we imagine notes as degrees in a harmonic context (or musical scale), i.e., “tonic” for the C note in a C key, “dominant” for the G note in a C key, etc, and in the same way we perceive notes as degrees. The absolute pitch, independent of the musical key, is also present but at a subconscious level.

I have only addressed a small portion of the phenomenology of pitch, since I have barely discussed harmony. But clearly, it appears that the phenomenology of pitch is very rich, and also not tied to the physics of sound in a straightforward way. It is deeply connected with the concepts of memory and time.

In light of these observations, it appears that current theories of pitch address very little of the phenomenology of pitch. In fact, all of them (both temporal and spectral theories) address the question of absolute pitch, something that most of us actually do not have conscious access to. It is even more limited than that: current models of pitch are meant to explain how the fundamental frequency of a sound can be estimated by the nervous system. Thus, they start from the physicalist postulate that pitch is the perceptual correlate of sound periodicity, which, as we have seen, is not unreasonable but remains a very superficial aspect of the phenomenology of pitch. They also focus on the problem of inference (how to estimate pitch) and not on the deeper problem of definition (what is pitch, why do some sounds produce pitch and not others, etc.).

Rate vs. timing (XVI) Flavors of spike-based theories (5) Rank order coding

I started with an overview of spike-based theories based on synchrony. I would like to stress that synchrony-based theories should not be mistaken for theories that predict widespread synchrony in neural networks. In fact, quite the opposite: as synchrony is considered as a meaningful event, it is implied that it is a rare event (since otherwise it would not be informative). But there are also theories that do not assign any particular role to synchrony, which I will discuss now.

One popular theory based on asynchrony is rank order coding or “first spike” theories. It was popularized in particular by Simon Thorpe, who showed that humans can categorize faces in such a small time that any neuron involved in the processing chain could fire little more than one spike. This observation discards theories based on temporal averages, both rate-based theories and interval-based theories. Instead, Simon Thorpe and colleagues proposed that the information is carried by the order in which spikes are fired. Indeed, receptors that are more excited (receiving more light) generally fire earlier, so that the order in which receptors are excited carries information that is isomorphic to the pattern of light on the retina. However, by itself, the speed of processing does not discard processing schemes that do not require temporal averages, for example synfire chains or rate-based schemes based on spatial averages. Indeed it is known that the speed at which the instantaneous firing rate of a population of noisy neurons can track a time-varying input is very fast and is not limited by the membrane time constant – an interesting point is that this fact is consistent with integrate-and-fire models and not with isopotential Hodgkin-Huxley models, but this is another story. However, one argument against rate-based schemes is that they are much less energetically efficient, since information is used only after averaging. To be more precise, a quantity can theoretically be estimated from the firing of N neurons with precision of order 1/N if their responses are coordinated, but of order 1/√N if the neurons are independent. In other words, the same level of precision requires N² neurons in a rate-based scheme, vs. N neurons in a spike-based scheme.

Computationally, first spike codes are not fundamentally different, at a conceptual level, from standard rate-based codes, because first spike latency is monotonically related to input intensity. However, one interesting difference is that if only the rank order, and not the exact timing, is taken into account, then this code becomes invariant to monotonous transformations of input intensity, for example global changes in contrast or luminance. However it is not invariant to more complex transformations.

Rank order codes are also different at a physiological level. Indeed an interesting aspect of this theory is that it acknowledges a physiological fact that is ignored by both rate-based theories and synchrony-based theories, namely the asymmetry between excitation and inhibition in neurons. How can a neuron be sensitive to the temporal order of its inputs? In synchrony-based theories, which rely on excitation, neurons are sensitive to the relative timing of their inputs rather than to their temporal order. Indeed temporal order is discontinuous with respect to relative timing: it abruptly switches at time lag 0. Such a discontinuity is provided by inhibition: excitation followed by inhibition is more likely to trigger a spike than inhibition followed by excitation. The asymmetry is due to the fact that spikes are produced when the potential exceeds a positive threshold (i.e., the trajectory crosses the threshold from below).

One criticism of rank order coding is that it requires a time reference. Indeed, when comparing two spike trains, any spike is both followed and preceded by a spike from the other train, unless only the “first spike” is considered. Such a time reference, which defines the notion of “first spike”, could be the occurrence of an ocular saccade, or the start of an oscillation period if there is a global oscillation in the network that can provide a common reference.

Rate vs. timing (XV) Flavors of spike-based theories (4) Synchrony as a sensory invariant

I finish this overview of synchrony-based theories with my recent proposal (PLoS Comp Biol 2012). In the next posts, I will discuss theories based on asynchrony. In the theories I have described so far, the starting point is a code based on spike timing, in general a spatiotemporal pattern of spikes assumed to represent some sensory input. But the connection between the sensory input and the spike pattern is not addressed, or at least not considered as a central issue. My proposition connects spike-based computation with the psychological theory of James Gibson, specifically the notion of structural invariant. Gibson starts his book “An ecological approach to visual perception” by criticizing the idea that perception is the process of inferring the objective properties of the world from ambiguous patterns of sensory data, as is often postulated. Indeed, since perception is the source of all knowledge, it is inconsistent to view the objective properties of the world as preexisting to perception. But how then can one know anything about the world?

I will rephrase Gibson’s thinking in a different way by using the dictionary analogy. Inferring the objective world from an image or some sensory data is like looking in a dictionary for the translation of a word in one’s native language. In fact, this is precisely what is generally meant by the “neural coding” metaphor. But this cannot be used to understand a new word in one’s own native language. Instead one uses a different kind of dictionary, in which the word is defined in relationship with other words. Thus the definition of objects in the world is relational, not inferential. Inference can only be secondary, since one must first know what is to be inferred.

How does this relate to perception? Gibson argues that information about the world is present in the invariant structure of sensory inputs, that is, in properties of sensory inputs (relationships) that persist through time, which is to say: the laws that sensory inputs follow. More precisely, invariant structure is a relationship that is invariant with respect to some change. The notion has been extended to sensorimotor relationships by Kevin O’Regan. Two simple examples in hearing are in pitch perception and sound localization. Sounds that evoke a pitch are generally periodic. Periodicity is a relationship on the sensory input, i.e., S(t+T)=S(t) for all times (T is the period and S(t) is the acoustical pressure), and it is precisely this relationship, rather than the spectrum of the sound, which defines the pitch (for example pitch is unchanged if the fundamental frequency is missing). This relationship is not spatial because it is unaffected by movements. In the same way, a sound source produces two acoustical waves at the two ears that have specific relationships, for example (if sound diffraction is neglected) the wave at the contralateral ear is a delayed version of the wave at the ipsilateral ear. This relationship is spatial because it is affected by one’s movements. Besides, there is a systematic relationship between interaural delay and head position that is isomorphic to the source direction. Therefore this relationship can be identified to the source direction, without the need for an externally defined notion of physical angle. When a sound is presented in a noisy environment, the direction has to be inferred since it is ambiguous, but what is inferred is the relationship that defines the direction.

How does this relate to synchrony? Simply put, synchrony is a relationship defined through time, so it qualifies as invariant structure. In my paper, I show how this relationship between spike timings can correspond to a relationship between sensory inputs by introducing the concept of “synchrony receptive field” (SRF). The SRF of a given pair of neurons is the set of sensory signals that elicit synchronous spiking in the two neurons (it can be extended to a group of neurons). Suppose the two neurons receive different versions of the sensory signal S: F(S) and G(S). Then assuming a deterministic mapping from signals to spikes, synchrony reflects the relationship F(S) = G(S), a relationship defined on the sensory inputs. Therefore, across the neural population, synchrony patterns reflect the set of relationships on the sensory inputs, and neurons that respond to coincidences signal these relationships.

This mechanism can be used practically to recognize relationships, for example: detecting an odor defined by ratios of receptor affinities, estimating the pitch of a sound, estimating the location of a sound source (Goodman and Brette, PLoS CB 2010), estimating binocular disparities. The key computational benefit is that it solves the difficult problem of invariance, that is, the fact that objects of perception are invariant under many different perspectives. For example a face can be seen under different angles, or the same sound source can produce different sounds at the same location. To be more precise, the problem is dissolved by this approach rather than solved. Indeed, the key insight is that invariance is only a problem for an inferential process. When the objects to be perceived are defined instead by relationships, then there is no invariance problem, since a relationship is itself an invariant. For example, periodicity is a relationship that is invariant to the spectrum of a sound.

The theory connects to the other spike-based theories I mentioned previously. Indeed sensory relationships are reflected by synchrony between neurons (or relative spike timing, considering conduction delays), and it addresses the problem of binding in the same way as synfire chains: sensory signals that are not temporally coherent, and therefore not originating from the same object, cannot produce synchronous firing. It also connects with the polychronous theory of working memory: the spike pattern that is stored in that theory corresponds here to a sensory relationship. This makes it is possible to store sensory relationships in the form of spike timing relationships, without the need for an explicit conversion to a “rate code”.

On the empirical side, the theory relies on the fact that neurons operate in a fluctuation-driven regime, in which excitation and inhibition are approximately balanced (or inhibition dominant), as empirically observed. But shouldn’t this theory predict widespread synchrony in neural populations, unlike what is observed in the brain? In fact it should not. First of all, synchrony is only informative if it is a rare event. This is precisely what is captured by the concept of synchrony receptive field: synchrony occurs only for specific sensory signals (or more precisely, sensory relationships). Even though I did not include it in the paper, it would actually make sense that correlations that are not stimulus-specific (i.e., those that can be predicted) are minimized as much as possible. This would support the idea that recurrent inhibition is tuned to cancel excitatory correlations (see my previous post), which would produce weak correlations on average.

Rate vs. timing (XIV) The neural "code"

I am making a break before I continue on my review of spike-based theories, and I want to comment on the notion of “neural code”. These are keywords that are used in a large part of the neuroscience literature, and I think they are highly misleading. For example, you could say that neurons in V1 “code for orientation”. But what this statement refers to, in reality, is simply that if we record the response of such a neuron to an oriented bar, then we observe that its firing rate is modulated by the orientation, peaking at an orientation that is then called the “preferred orientation”. First of all, the notion of a “preferred orientation” is just a definition tied to the specific experimental protocol (the same is true for the notion of best delay in sound localization). Empirically speaking, it is an empty statement. In particular, by itself it does not mean that the cell actually “prefers” some orientations to others in any way, because a preferred orientation can be defined in any case and could be different for different protocols – it is just the stimulus parameter giving maximum response. So the only empirical statement associated to the claim “the neuron codes for orientation” is in fact: the neuron’s firing rate varies with orientation. Therefore, using the word “codes” is just a more appealing way of saying “varies”, but the empirical content is actually no more than “varies”.

In what sense then can we say that the neuron “codes” for orientation? Coding means presenting some information in a way that can be decoded. That is, the neuron codes for an orientation with its firing rate in the sense that from its firing rate it is possible to infer the orientation. Here we get to the first big problem with the notion of a “neural code”. If the firing rate varies with orientation and one knows exactly how (quantitatively), then of course it is possible to infer some information about orientation from the firing rate. The way you would decode a particular firing rate into an estimated orientation is by looking at the tuning curve, obtained by the experimental protocol, and look for the orientation that gives the best matching firing rate. But this means that the decoding process, and therefore the code, is meant from the experimenter’s point of view, not from the organism’s point of view. The organism does not know the experimental protocol, so it cannot make the required inference. If all the organism can use to decode the orientation is a number of spikes, then clearly this task is nearly impossible, because without additional knowledge, a tremendous number of stimuli could produce that same number of spikes (e.g. by varying contrast or simply presenting something else than a bar). Thus the first point is that the notion of a code is experimenter-centric, so talking about a “neural code” in this sense is highly misleading, as the reader of this code is not neurons but the experimenter.

So the first point is that, if the notion of a neural code is to make any sense at all, it should be refined so as to remove any reference to the experimental protocol. One clear implication is that the idea that a single neuron can code for anything is highly questionable: is it possible to infer anything meaningful about the world from a single number (spike count), and no a priori knowledge? Perhaps the joint activity of a set of neurons may make more sense. This reduces the interest of “tuning curves” in terms of coding – it may still be informative about what neurons “care about”, but not about how they represent information, if there is such a thing. Secondly, removing any reference to the experimental protocol means that one can speak of a neural code for orientation only if it does not depend on other aspects, e.g. contrast. Indeed if the responses were sensitive to orientation but also to everything else in the stimulus, how could one claim that the neuron codes for orientation? Finally, thinking of a code with a neural observer in mind means that, perhaps, not all codes make sense. Indeed, is the function of V1 to “represent” the maximum amount of visual information? This view, and the search for “optimal codes” in general, seems very odd from the organism’s point of view: why devote so many neurons and so much energy to represent exactly the same amount of information that is already present in the retina? If a representation has any use, then this representation must be different in nature from the original presentation, and not just in content. So the point is not about how much information there is, but in what form it is represented. This means that codes cannot be envisaged independently of a potential decoder, i.e., a specific way in which neurons use the information.

I now come to a deeper criticism of the notion of neural code. I started by showing that the notion is often meant in the sense of a code for the observer, not for the brain. But let us say that we have fixed that error and we are now looking for neural codes, with a neural-centric view rather than an experimenter-centric view. But still, the methodology is: looking at neural responses (rates or spike timings) and trying to find how much information there is and under what form. Clearly then, the notion that neurons code for things is not an empirical finding: it is an underlying assumption of the methodology. It starts, not ends, by assuming that neurons fire so that the rest of the brain can observe it and take the information it sees in this firing. It is postulated, not observed, that what neurons do is produce some form of representation for the rest of the brain to see. This appears to be very centered on the way we, external observers, acquire knowledge about the brain, and it has a strong flavor of the homunculus fallacy.

I suggest we consider another perspective on what neurons do. Neurons are cells that continuously change in many aspects, molecular and electrical. Even though we may want to describe some properties of their responses, spikes are transient signals, there is nothing persistent in them in the same way as a painting. So neurons do not represent the world in the same way as a painter would represent the world. Second, spikes are not things that a neuron leaves there for observers to see, like the pigments on a painting. On the contrary, a neuron produces a spike and actively sends it to target neurons, where changes will occur because of this spike. This is much more like an action than like a representation. Thus it is wrong to say that the postsynaptic neuron “observes” the activity of presynaptic neurons. Rather, it is influenced by it. So neural activity is not a representation, it is rather an action.

To briefly summarize this post: neurons do not code. This is a view that can only be adopted by an external observer, but it is not very meaningful to describe what neurons do. Perhaps it is more relevant to say that neurons compute. But the best description, probably, is to say that neurons act on other neurons by means of their electrical activity. To connect with the general theme of this series, these observations emphasize the fact that the basis of neural computation is truly spikes, and that rates are an external observer-centric description of neural activity.

Rate vs. timing (XIII) Flavors of spike-based theories (3) Polychronization

In a synfire chain, activity propagates synchronously from one layer to the next. Transmission delays are identical for all synaptic connections between two layers. Bienenstock proposed the notion of synfire braid for the case when transmission delays are heterogeneous (Bienenstock (1994), A model of neocortex, see Appendix A). The idea was expanded by Izhikevich with the terminology “polychronization” (Izhikevich, Neural Comp 2006; Szatmary and Izhikevich, PLoS CB 2010), and related to Edelman’s theory of neural darwinism (see his 1988 book).

Polychronization relies on the same propagation mechanism as synfire chains, but a polychronous group differs from a synfire chain in that there are no layers per se. When a set of neurons fire spikes at times such that, added to the transmission delays, they arrive simultaneously at a common target neuron, this neuron may fire. A spatio-temporal pattern of activity congruent with synaptic delays may then propagate along the “polychronous group”. Polychronization is a natural generalization of synfire chains, but there are interesting new properties. First, in a recurrent neural network, there are potentially many more polychronous groups than neurons, unlike synfire chains, and each neuron may participate in many such groups. In fact, in theory, the exact same set of neurons can participate in two different groups, but in a different activation order. In a recurrent network, polychronous groups spontaneously ignite at random times and for short durations. This means in particular that repeating spatiotemporal patterns would be very difficult to observe in spontaneous activity. It is important to emphasize that “polychronization” is not a specific mechanism, in that it does not rely on particular anatomical or physiological mechanisms other than what is currently generally accepted. It occurs in balanced excitatory-inhibitory networks with irregular activity and standard spike-timing-dependent plasticity, and in such conditions neurons are known to be highly sensitive to coincidences. In other words, it is a particular perspective on the dynamics of such networks.

The interesting application of polychronization is working memory (Szatmary and Izhikevich 2010). As I mentioned, the general theoretical context is Edelman’s theory of neural darwinism. Edelman got the Nobel prize in the 1970s for his work on the immune system, and he then moved to neuroscience where he developed a theory that relies on an analogy with the immune system. In that system, there is a preexisting repertoire of antibodies, which are more or less random. When a foreign antigen is introduced, it binds to some of these antibodies, and the response is then amplified with clonal multiplication – much like in Darwinian evolution theory. Here a memory item is presented under the form of a specific spatiotemporal pattern of activation. This pattern may be congruent with some of the connection delays of the network, i.e., it may correspond to a polychronous group (or part of one): this corresponds to the binding of antibodies to an antigen. It is then hypothesized that these connections are amplified through quick associative plasticity, i.e., STDP acting on a short time scale, fading over about 10 seconds (one hypothesis relies on NMDA spikes). Note that this is very similar in spirit to von der Malsburg’s “dynamic link matching”. This step corresponds to clonal amplification in the immune system. Because of the reinforcement of the connections, the polychronous group is spontaneously reactivated at random times, until it ultimately fades out. In this way the spatiotemporal pattern is replayed for a few tens of seconds.

The articulation of this theory with empirical evidence is rather interesting in the context of the rate vs. timing debate. First, it builds on generally accepted empirical evidence, including irregular firing statistics and excitatory-inhibitory balance. The theory is based on precise spike timing, but spike timing is not reproducible. Indeed spike timing is not locked to the stimulus, and only relative spike timing matters. What is more, in different trials, different polychronous groups may be selected and therefore even relative spike timing may not be reproducible across trials (but probably within a trial). Another interesting observation is that it predicts that the firing rate of some neurons should show an increasing (ramping) firing rate after stimulus presentation. This is not because these neurons “encode duration” with their rate, but because as the polychronous group is spontaneously reactivated, its length progressively increases, which means that neurons in the group’s tail fire more and more often through the course of reactivation.

Without judging the validity of the polychronization theory, I note that it provides a concrete example of a spike-based theory that appears consistent with many aspects of neural statistics (irregularity, lack of reproducibility, etc). This fact demonstrates once more that these aspects cannot be used as arguments in support of rate-based theories.

Rate vs. timing (XII) Flavors of spike-based theories (2) Binding by synchrony

Before I discuss polychronization, I would like to complement the previous post with a brief discussion of the binding problem, and the idea of binding by synchrony. Synfire chains address this issue (see in particular Bienenstock (1994), “A model of neocortex”), but they are not the only such proposition based on spike timing. In the 1980s, Christoph von der Malsburg developed a theory of neural correlations as a binding mechanism (1981, “The correlation theory of brain function”). He wrote a very clear review in a special issue of Neuron on the binding problem (1999).

In classical neural network theory, objects are represented by specific assemblies of neurons firing together. A problem arises when two objects are simultaneously presented, the “superposition catastrophe”: now two assemblies are simultaneously active, and it may be ambiguous whether the entire set of neurons corresponds to one big assembly or to two smaller assemblies, and if there are two assemblies it may be unclear which neurons belong together. There is a specific example due to Rosenblatt (1961, “Principles of neurodynamics”): suppose there are two neurons coding for shape, one for squares and another for triangles, and two neurons coding for location, one for top and another for bottom. This type of coding scheme, which von der Malsburg calls “coarse coding”, is efficient when there are many possible dimensions, because the number of possible combinations increases exponentially with the number of dimensions. But in the classical neural network framework, it fails when a square and a triangle are simultaneously presented: indeed all four neurons are activated, and it is impossible to tell whether the square is above the triangle.

The proposed solution to the superposition catastrophe is to use spike timing as a signature of objects. In this example, neurons coding for the same object would fire at the same time, so that the two neural assemblies can be distinguished unambiguously. von der Malsburg mentions in this review that the source of synchrony can be external (as in my paper on computing with synchrony), or internal – one proposition being that binding is mediated by transient gamma oscillations (see Wolf Singer’s work). But he also warns that it may be difficult to find experimental evidence of such a mechanism, because in his mind synchrony should be transient since it has to be dynamic, and should involve many neurons so as to immediately impact postsynaptic neurons (this is related to my post on the difference between synchrony and correlation). Thus to observe such transient distributed synchrony requires to simultaneously record a large number of neurons, and perhaps to know which of these neurons are relevant.

He postulates that binding requires a fast synaptic plasticity mechanism, “dynamic link matching”, the ability to quickly form connections between neurons coding for different features. The idea applies to the binding problem, i.e., representing several objects at the same time, but also to the compositionality problem, which is slightly different and perhaps more general. A good example, perhaps, is working memory for melodies. A melody is a sequence of musical notes with specific pitch and duration. One can hear a melody for the first time and repeat it (e.g. sing it). If the melody is really novel, this ability cannot rely on “combination cells”, it really is the particular sequence of notes that must be kept in memory. This example instantiates both the binding problem and the compositionality problem. There is a binding problem because each note is defined by two properties, pitch and duration, that have to be bound together. There is a compositionality problem because the notes must be organized in the correct order, and not just as a set of notes. So what needs to be represented is not just a set of features, but the organization (the links) between these features, the knowledge of which note goes after each note. In mathematical terms, one needs to represent an oriented graph between the individual notes. For this problem, classical connectionism seems insufficient – for example neural models of working memory based on attractors.

To summarize this post and the previous post, in these approaches, spike-based theories were introduced to address two shortcomings of standard rate-based neural network theories: the binding problem and the compositionality problem. Therefore, if contenders of these theories still need to find empirical evidence, contenders of rate-based theories also still need to respond to these criticisms – compositionality being probably the most difficult problem.

The machine learning analogy of perception

To cast the problem of neural computation in sensory systems, one often refers to the standard framework in machine learning. A typical example is as follows: there is a dataset, which could be for example a set of images, and the goal is to learn a mapping between these images and categories, for example faces or cars. In the learning phase, labels are externally given to these images, and the machine learning algorithm builds a mapping between images and labels. As an analogy of what sensory systems do, the question is then: how do neurons learn this mapping, e.g. to fire when they are presented with an image of a given category? This question is the starting point of many theories in computational neuroscience. It is essentially an inference problem: to each category corresponds a distribution of images, and so what sensory systems must do is learn this distribution and compute what the most likely category is for a given presented image. This is why Bayesian approaches are appealing from this point of view, because an efficient sensory system should then be an ideal Bayesian observer. It just follows from the way the problem of perception is cast.

But is this actually a good analogy? In fact, it differs from the problems sensory systems actually face in at least three important ways:

1) elements of the data set are considered independent;

2) these elements are externally given;

3) the labels are externally defined.

First of all, elements of the data set are never independent in a real perceptual system. On the contrary, there is a continuous flow of sensory input. Vision is not a slideshow. The visual field changes through time in a continuous way, and more importantly the changes are lawful because objects are embedded in the physical world. We can perceive these laws, for example the rigidity of movements, and this is something that cannot be found in the “slideshow” view of vision that is implied by the machine learning analogy. I believe this is the main message of James Gibson. Moreover, there are lawful relationships in the sensory inputs, but there are also sensorimotor relationships. This is information that can be picked up from the sensory or sensorimotor flow, not by inference from the distribution of slides in the slideshow. This means that perception is not (or not only) inferential but relational: sensory inputs are analyzed in reference to themselves (their internal structure), and not (only) to memory.

A second point is that in the machine learning analogy, elements of the dataset are considered given, and the algorithm reacts to it. In psychology, this view corresponds to behaviorism, in which the organism is only considered from a stimulus-reaction point of view. But in fact a more ecologically accurate view is that data are in general produced by the actions of the organism, rather than passively received. Gibson criticized the information processing viewpoint for this reason, because the world does not produce messages to be decoded by a receiver, on the contrary a perceptual system samples its environment. It is really the opposite view: the organism does not react to a stimulus, but rather the environment reacts to the actions of the organism, and it is this reaction that is analyzed by the organism. In the machine learning field, there are new frameworks that try to address this aspect, named “active learning”: the algorithm chooses a data element and asks for its label, for example to maximize the information that can be gained.

Finally, in the machine learning analogy, the label is externally defined. But in a closed system, this is not possible. The organism must define the relevant categories by itself. But how can these categories be a priori defined? Often, this problem is discarded by what I would call “evolutionary magic”: these categories are provided by “evolution” because they are important to the survival and reproduction of the animal. I call it “magic” because the teleological argument does not provide any explanation at all: it is about as metaphysical as if “evolution” were replaced by “God”, in the sense that it has the same explanatory power. Bringing intergenerational changes of the organism does not solve the problem: whatever mechanism is involved, pressure for change still has to come from the environment and the way the organism can interact with it, not from an external source.

In fact, this problem was addressed by the development of phenomenology in philosophy, introduced by Husserl about a century ago. Followers of the phenomenological approach include Merleau-Ponty and Sartre. The idea is the following. What “really” exists in the world is a metaphysical question: it actually does not matter for the organism if it makes no difference to its experience. For example, is there such a thing as “absolute space”, the existence of an absolute location of things? The question is metaphysical because only relative changes in space can be experienced (the relative location of things) – this point was noted by Henri Poincaré. In the phenomenological approach, “essence” is what remains invariant under changes of perspective. I believe this is related to a central point in Gibson’s theory: information is given in the “structural invariants” present in the sensory inputs. These invariants do not need an external reference to be noticed.

For example, consider a sound source that produces two acoustical waves at the two ears. Neglecting sound diffraction, these acoustical waves are identical apart from a propagation delay (the interaural time difference or ITD). When a sound is produced by the source, this property is invariant through time – it is a law that is always satisfied. But what makes it a spatial property? It is spatial because the property is broken when movements are produced by the organism (e.g. head movements). In addition, there is a higher-order property, which is the relationship between the interaural delay and the movements of the head, which is always true, as long as the source does not move. This structural invariant is then information about the location of the sound source, in fact the relationship can be mapped to the physical location of the source. But the “label” here is intrinsically defined: it is precisely the relationship between head position and ITD. Thus labels can be intrinsically defined, as the sensory and sensorimotor structure. This is the postulate of the sensorimotor account of perception, according to which perception is precisely the anticipated effect of the organism’s action on the sensory inputs.

The fact that these labels can be intrinsically defined is, I believe, what James Gibson means when he states that information is “picked-up” and that perception is “direct”. But I would like to go further: there is no doubt that there can be inference in perception, and so in that sense perception cannot be entirely direct. For example, one can visually recognize an object that is partially occluded, and imagine the rest of the object (“amodal perception”). But the point is that what is inferred, i.e., the “label” in the machine learning terminology, is not an externally given category, but the sensory or sensorimotor structure, part of which is hidden. The main difference is that there is no need for an external reference. For example, in the sound localization example, a brief sound may be presented at a given direction. Then the sensorimotor structure that defines source direction for the organism is hidden, since there is no sound when the organism can turn its head. So this structure is inferred from the ITD. In other words, what is inferred is not an angle, which would make no sense for an animal that has no measurement tool, it is the effect of its own movements on the perceived ITD. So there is inference, but inference is not the basis of perception. It cannot be, for how would you know what should be inferred? For this reason, Gibson rejects inference by the argument that it would yield to infinite regress. As I have tried to explain, it is not inference per se that is problematic, but the idea that it might be the basis of perception.

This is quite important for our view of neural computation: this means that Bayesian inference is not so central anymore in the function of sensory systems. Certainly, inference is useful and perhaps necessary in many cases. But perhaps more important is the discovery of sensory and sensorimotor structure, that is, the elaboration of what is to be inferred. This requires the development of a theory of neural computation that is primarily relational rather than inferential.

In summary, labels can be intrinsically defined by the invariant structure of sensory and sensorimotor signals. I would like to end this post with another important Gibsonian notion: “affordances”. Gibson thought that we perceive “affordances”, which are what the objects of perception allow in terms of interaction. For example, a door affords opening, a wall affords blocking, etc. This is an important notion, because it defines meaning in terms of things that make sense to the organism, rather than in externally defined terms.

To conclude, a theory of neural computation that takes into account these points should differ from standard theories in the following way: it should be

1) relational (discovering internal structure) rather than inferential (comparing with memory),

2) active (inputs are not questions but answers) rather than passive (inputs are questions, actions are answers), and

3) subjective (meaning is defined by the interaction with the environment) rather than objective (objects are externally defined).

Rate vs. timing (XI) Flavors of spike-based theories (1) Synfire chains

I will now give a brief overview of different spike-based theories. This is not meant to be an exhaustive review – although I would be very glad to have some feedback on what other theories would deserve to be mentioned. My aim is rather to highlight a common theme in these theories, which distinguishes them from rate-based theories. At the same time, I also want to emphasize the diversity of these theories.

Synfire chains” were introduced by Moshe Abeles in the 1980s (see his 1991 book: Corticonics: Neural Circuits of the Cerebral Cortex), although in fact the concept can be traced back to 1963 (Griffith, Biophysical Journal 1963). It is based on the observation that neurons are extremely sensitive to coincidences in their inputs (in the fluctuation-driven regime) - I commented on this fact in a previous post. So if a small number of input spikes are received synchronously in one neuron (on the order of ten), then the neuron spikes. Now if these presynaptic neurons also have postsynaptic neurons in common, then these postsynaptic neurons will also fire synchronously. By this mechanism, synchronous activity propagates from one group of neurons to another group. This is a feedforward mode of synchronous propagation along a chain of layers (hence the terms “synfire chains”), but note that this feedforward mode of propagation may be anatomically embedded in a recurrent network of neurons. It is important to note that this propagation mode 1) can only be stable in the fluctuation-driven regime (as opposed to the mean-driven regime), 2) is never stable for perfect integrators (without the leak current). A simple explanation of the phenomenon was presented by Diesmann et al. (Nature 1999), in terms of dynamical systems theory (synchronous propagation is a stable fixed point of the dynamics). In that paper, neurons are modeled with background noise – it is not a deterministic model. Much earlier, in 1963, Griffith presented the deterministic theory of synfire chains (without the name) in discrete time and with binary neurons (although he did consider continuous time at the end of the paper). He also considered the inclusion of inhibitory neurons in the chain, and showed that it could yield stable propagation modes with only a fraction of neurons active in each layer. Izhikevich extended the theory of synfire chains to synchronous propagation with heterogeneous delays, which he termed “polychronization” (I will discuss it later).

As I have described them so far, synfire chains are postulated to result from the dynamics of neural networks, given what we know of the physiology of neurons. This raises two questions: 1) do they actually exist in the brain? and 2) are they important? I do not think there is a definite empirical answer to the first question (especially considering possible variations of the theory), but it is interesting to consider its rejection. If one concludes, on the basis of empirical evidence, that synfire chains do not exist, then another question pops up: why are they not observed, given that they seem to be implied by what we know of neural physiology? Possibly then, there could be specific mechanisms to avoid the propagation of synchronous activity (e.g. cancellation of expected correlations with inhibition, which I mentioned in a previous post). I am not taking side here, but I would simply like to point out that, because the existence of synfire chains is deduced from generally accepted knowledge, the burden of proof should in fact be shared between the tenants of synfire chains and their opponents: either synfire chains exist, or there is a mechanism to prevent them from existing (or at least their non-existence deserves an explanation). Before someone objects that models of recurrent spiking neural networks do not generally display synfire activity, I would like to point out that these are generally sparse random networks (i.e., with no short cycle in the connectivity graph) in which the existence of the irregular state is artificially maintained by external noisy input (e.g. Brunel (2000)), or by a suprathreshold intrinsic current (e.g. Vogels and Abbott 2005).

The second question is functional: what might be the computational advantage of synfire chains? As I understand it, they were not introduced for computational reasons. However, a computational motivation was later proposed in reference to the binding problem, or more generally compositionality by Bienenstock (1996, in Brain Theory). A similar proposition was also made by Christof von der Marlsburg and Wolf Singer, although in the context of oscillatory synchrony, not synfire chains. A scene is composed of objects that have relationships to each other. Processing such a scene requires identifying the properties of objects, but also identifying the relationships between these objects, e.g. that different features belong to the same object. In a rate-based framework, the state of the network at a given time is given a vector of rates (one scalar value per neuron). If the activation of each neuron or set of neurons represents a given property, then the network can only represent an unstructured set of properties. The temporal organization of neural activity may then provide the required structure. Neurons firing in synchrony (at a given timescale) may then represent properties of the same object. The proposition makes sense physiologically because presynaptic neurons can only influence a postsynaptic target if they fire together within the temporal integration window of the postsynaptic neuron (where synchrony is seen from the postsynaptic viewpoint, i.e., after axonal propagation delays). In my recent paper on computing with neural synchrony (which is not based on synfire chains but on stimulus-specific synchrony), I show on an olfactory example how properties of the same object can indeed be bound together by synchrony. I also note that it also provides a way to filter out irrelevant signals (noise), because these are not temporally coherent.

Apart from compositionality, the type of information processing performed by synfire chains is very similar to those of feedforward networks in classical neural network theory. That is, the activation (probability of firing) of a given unit is essentially a sigmoidal function of a weighted sum of activations in the preceding layer (if heterogeneous synaptic weights are included) – neglecting the temporal dispersion of spikes. But there may be another potential computational interest of synfire chains, compared to traditional rate-based feedforward models, which is processing speed. Indeed, in synfire chains, the propagation speed is limited by axonal conduction delays, not by neural integration time.

In the next post, I will comment on polychronization, an extension of synfire chains that includes heterogeneous delays.

Reader's digest (12 Dec 2012)

I am starting a new series of posts on this blog, called « Reader’s digest ». These are simply bibliographical notes on recent readings (of recent or old papers).

While reading Abeles’s Scholarpedia entry on synfire chains, I learned that while Abeles introduced the terms “synfire chains”, the concept is in fact older. It seems that it dates back to Griffith in 1963 (Griffith 1963). In that paper, he considers threshold binary neurons, in discrete time. Previous authors showed that networks of such neurons could only be in two stationary states: quiescent or fully active (Beurle 1956; Ashby et al. 1962), and this was seen as a paradox, since apparently this is not what happens in the nervous system. This is reminiscent of a problem that is still present in current literature: the stability of the persistent irregular state in closed networks of spiking neurons (indeed most models use either external noise to maintain activity (Brunel 2000) or a suprathreshold intrinsic current, as in (Vogels & Abbott 2005)).

In his paper, Griffith introduces a “transmission line”, which is exactly a synfire chain, except with discrete rather than continuous time (an approximation that he acknowledges and even reconsiders at the end of the paper). He demonstrates (with calculations involving binomial distributions and the central limit theorem) that indeed there is only a single stable mode of propagation (all neurons active) but that if inhibitory neurons are also included, then there may be another stable mode, with only a fraction of neurons being active. He also mentions the possibility of unstable oscillations due to inhibition (which corresponds to what is now called the “PING” mechanism, pyramidal-interneuron gamma).

The paper is interesting for two reasons: 1) it includes methods of calculations relevant for synfire chain propagation, 2) it seems to provide a solution for the problem of the stability of irregular activity, which is based on a small number of strong inhibitory neurons. This latter point applies both to synfire chains and to more traditional recurrent networks.

 

References

Ashby, W.R., Von FOERSTER, H. & Walker, C.C., 1962. Instability of Pulse Activity in a Net with Threshold. Nature, 196(4854), p.561‑562.

Beurle, R.L., 1956. Properties of a Mass of Cells Capable of Regenerating Pulses. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 240(669), p.55‑94.

Brunel, N., 2000. Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. J Comput Neurosci, 8(3), p.183.

Griffith, J.S., 1963. On the Stability of Brain-Like Structures. Biophysical Journal, 3(4), p.299‑308.

Vogels, T.P. & Abbott, L.F., 2005. Signal Propagation and Logic Gating in Networks of Integrate-and-Fire Neurons. The Journal of Neuroscience, 25(46), p.10786‑10795.

What is computational neuroscience? (VII) Incommensurability and relativism

I explained in previous posts that new theories should not be judged by their agreement with the current body of empirical data, because these data were produced by the old theory. In the new theory, they may be interpreted very differently or even considered irrelevant. A few philosophers have gone so far as to state that different theories are incommensurable, that is, they cannot be compared with each other because they have different logics (e.g. observations are not described in the same way in the different theories). This reasoning may lead to relativistic views of science, that is, the idea that all theories are equally “good” and that their choice are a matter of personal taste or fashion. In this post I will try to explain the arguments, and also to discard relativism.

In “Against Method”, Feyerabend explains that scientific theories are defined in a relational way, that is, elements of a theory make sense only in reference to other elements of the theory. I believe this is a very deep remark that applies to theories of knowledge in the broadest sense, including perception for example. Below, I drew a schematic figure to illustrate the arguments.

Theories are systems of thought that relate to the world. Concepts in a theory are meant to relate to the world, and they are defined with respect to other concepts in the theory. A given concept in a given theory may have a similar concept in another theory, but it is a different concept, in general. To explain his arguments, Feyerabend uses the analogy of language. It is a good analogy because languages relate to the world, and they have an internal relational structure. Imagine theories A and B are two languages. A word in language A is defined (e.g. in the dictionary) by using other words from language A. A child learns her native language by picking up the relationship between the words, and how they relate to the world she can see. To understand language A, a native speaker of language B may translate the words. However, translation is not definition. It is imprecise because the two words often do not have exactly the same meaning in both languages. Some words may not even exist in one language. A deeper understanding of language A requires to go beyond translation, and to capture the meaning of words by acquiring a more global understanding of the language, both in its internal structure and in its relationship with the world.

Another analogy one could make is political theories, in how they view society. Clearly, a given observation can be interpreted in opposite ways in conservative and liberal political views. For example, the same economic crisis could be seen as the result of public debt or as the result of public cuts in spending (due to public acquisition of private debt).

These analogies support the argument that an element of a new theory may not be satisfactorily explained in the framework of the old theory. It may only make full sense when embedded in the full structure of the new theory – which means that new theories may be initially unclear and that the concepts may not be well defined. This remark can certainly make different theories difficult to compare, but I would not conclude that theories are incommensurable. This conclusion would be valid if theories were closed systems, because then a given statement would make no sense elsewhere than in the context of the theory in which it is formulated. Axiomatic systems in mathematics could be said to be incommensurable (for example, Euclidian and non-Euclidian geometries). But theories of knowledge, unlike axiomatic systems, are systems that relate to the world, and the world is shared between different theories (as illustrated in the drawing above). For this reason, translation is imprecise but not arbitrary, and one may still assess the degree of consistency between a scientific theory and the part of the world it is meant to explain.

One may find an interesting example in social psychology. In the theory of cognitive dissonance, new facts that seem to contradict our belief system are taken into account by minimally adjusting that belief system (minimizing the “dissonance” between the facts and the theory). In philosophy of knowledge, these adjustments would be called “ad hoc hypotheses”. When it becomes too difficult to account for all the contradictory facts (making the theory too cumbersome), the belief system may ultimately collapse. This is very similar to the theory of knowledge defended by Imre Lakatos, where belief systems are replaced by research programs. Cognitive dissonance theory was introduced by a field study in a small American sect who believed that the end of the world would occur at a specific date (Festinger, Riecken and Schachter (1956), When Prophecy Fails. University of Minnesota Press). When the said date arrived and the world did not end, strangely enough, the sect did not collapse. On the contrary, it made it stronger, with the followers more firmly believing in their view of the world. They considered that the world did not end because they prayed so much and God heard their prayers and postponed the event. So they made a new prediction, which of course turned out to be false. The sect ultimately collapsed, although only after a surprisingly long time.

The example illustrates two points. Firstly, a theory does not collapse because one prediction is falsified. Instead, the theory is adjusted with a minor modification so as to account for the seemingly contradicting observation. But this process does not go on forever, because of its interaction with the world: when predictions are systematically falsified, the theory ultimately loses its followers, and for a good reason.

In summary, a theory of knowledge is a system in interaction with the world. It has an internal structure, and it also relates to the world. And although it may relate to the world in its own words, one may still assess the adequacy of this relationship. For this reason, one may not defend scientific relativism in its strongest version.

For the reader of my other posts in this blog, this definition of theories of knowledge might sound familiar. Indeed it is highly related to theories of perception defended by Gibson, O’Regan and Varela, for example. After all, perception is a form of knowledge about the world. These authors have in common that they define perception in a relational way, the relationship between the actions of the organism in the world (driven by “theory”) and the effects of these actions on the organism (“tests” of the theory). This is in contrast with “neurophysiological subjectivism”, for which meaning is intrinsically produced by the brain (a closed system, in my drawing above) and “computational objectivism”, in which there is a pre-defined objective world (related to the idea of translation).