The hard problem of consciousness explained with sex

By using the word “sex” in the title, I am obviously trying to increase the number of my readers. But not only. I noticed that many people, in particular scientists, do not seem to get why there is a problem at all. Philosophers like to refer to the experience of color, and the question is then: why is it that red feels like red, and not like something else, and why does it feel like anything at all? Why is it not just information as it is for a computer, why does the experience have a particular quality? Somehow this example does not speak to everyone, so I came with another example: why is it that sex is associated with pleasure, in particular, why is male ejaculation generally associated with orgasm?

You will hear several kinds of explanation. One might tell you: your body secretes some particular molecules at that moment, or some particular neuron gets excited, and that's what produces the sensation of pleasure. But of course, this explanation just pushes the problem a little further, by replacing “ejaculation” by “molecule” or “neuron”. Why is it that a particular molecule produces pleasure? Certainly, when it is not in the brain, the molecule does not provide pleasure to a Petri dish.

Another kind of explaining away is the functionalist or behaviorist view: if everything appears the same from an external observer point of view, then it is the same; there is nothing more than what can be seen. In other words, male orgasm is the act of ejaculating, end of the story. But sex does not fit so well with that view. First, it is well known that female orgasm can be faked; second, male orgasm can be dissociated from ejaculation (in tantric sex).

And finally there is the evolutionary explanation: we feel pleasure because it motivates us for sex, which is necessary for reproduction. But the logic is flawed: we only need a mechanism to make us have sex, but there is no intrinsic reason why that mechanism should be accompanied with any feeling. Why are we not reproducing machines without feelings?

Here comes the example. The axolotl is a kind of salamander that retains its larval traits throughout its adult life. It's basically a cute baby salamander. Nevertheless, it is able to reproduce. The way it reproduces is interesting (probably not different from other salamanders, but cuter). The male lays a sort of jelly full of spermatozoids on the floor. And then later the female comes and inserts the jelly in her belly. After a few days, the female produces eggs.

Now the big question: does the male axolotl have an orgasm when it lays its eggs on the floor?

4 key questions about consciousness and the mind-body problem

It is fair to say that we have little idea about how neural activity gives rise to consciousness, and about the relationship between neural activity and conscious states (i.e., what you are experiencing). This is the mind-body problem. In my opinion, there has been relatively little fundamental progress on this question because it has been addressed mainly within the computationalist framework (ie in terms of information processing), which is very inappropriate for this question (this is partly Chalmers' criticism). So below I am listing a number of unanswered questions on this matter, which I believe requires a very different kind of approach. First of all, let me remark that because being conscious is always being conscious of something, understanding consciousness is largely about understanding perception at the phenomenal level (perception in the broadest sense, e.g., perceiving your thoughts).

1) How can perception be stable?

Why is it that a pure tone feels like a stable percept when 1) the acoustic wave is time-varying, 2) the activity of neurons everywhere in the brain is dynamic? The same can be said of all senses; in vision, the eyes move at high frequency even when fixating an object, and there is no visual percept if they are forced to be still. More generally: if there is a mapping between states of the brain and percepts, then why is it that percepts are not changing all the time?

A thought experiment. Imagine the state of the brain is held fixed. Someone scratches her nose and time is stopped. Would you still experience something? Any conscious experience seems to require a change, not just a state. This suggests that the relevant mapping is actually not from brain states to percepts, but from brain activity to percepts. This immediately raises a problem, because a conscious state can be defined at any point in time, but it is not immediate that brain activity can (as this would reduce activity to state). This is not a fatal problem, though, for there is a precedent in physics: a gas is composed of individual particles, but the pressure of a gas at a given instant cannot be defined as a function of the state of the particles at that moment, because pressure corresponds to the force exerted by the particles impacting a surface. It might be that the relation between neural activity and conscious states is of a similar kind as the relation between mechanics and thermodynamics.

Two more thoughts experiments. 1) Record the firing of all neurons in the brain, then play them on a set of unconnected light diodes, does that set feel the same experience? 2) (adapted from Chalmers) Replace randomly every other neuron in the brain by an artificial neuron that interacts with other neurons in exactly same way as the neuron it replaces, would there be a conscious experience? My personal answers would be: (1) no and (2) yes, and this suggests to me that the right substrate to look at is not neural activity as a state (e.g. firing rates of all neurons) but neural activity as an interaction between neurons.

 

2) What is time for a conscious brain?

A fundamental property of consciousness is its unity: a single conscious entity sees, hears and thinks. If visual and auditory areas where independent and, say, control speech, then one conscious entity would report visual experience and another conscious entity would report auditory experience. It could not be a single conscious entity since the two relevant parts are physically disconnected. Thus the unity of consciousness requires an interdependence between all the elements that compose it. This is, as I understand it, the issue that is addressed by a number of biological theories of consciousness, for example Edelman's “reentrant loops” or Tononi's integrated information theory.

However, as far as I know, there is another crucial aspect to this problem, which is the unity of consciousness, or lack of it, in time. There is no general unity of consciousness across time: two things that happen at, say, 1 minute of interval produce distinct percepts, not a single one. Clearly, consciousness is dynamic. But the big question is: how can there be a unique conscious state at any given moment in time when all the elements of the conscious network interact with some delay (since they are physical elements), typically of a few milliseconds? And what is time for such a network? Imagine there is a (physical) visual event at time t1 and an auditory event at time t2. At what time do they occur for the network, as they are sensed at different times by all its elements?Why is it that electricity changes on a millisecond timescale in the brain but conscious states seem to change at a much slower rate?

 

3) How can there be an intrinsic relation between neural activity and percepts?

Why is it that a particular pattern of neural activity produces the experience of redness? Most biological explanations are of this kind: I experience redness because when some red object is presented, neurons fire in that specific way. This is the coding perspective. The problem in the coding perspective is of course: who decodes the code? Ultimately, this kind of explanation is strongly dualist: it is implicitly assumed that, at some point, neural activity is transformed into the redness experience by some undetermined process that must be of a very different nature.

I would like to point out that proposals in which perception lies in the interaction between the organism and the environment (e.g. the sensorimotor theory) do not solve this problem either. I can close my eyes and imagine something red. It could be that redness corresponds to a particular way in which visual inputs change when I move my eyes or the surface, which I am anticipating or imagining, but this does not explain what is intrinsically red about the pattern of neural activity now. If we cannot explain it without referring to what happened before, then we are denying that the pattern of neural activity itself determines experience, and again this is a strong dualist view.

An experiment of thought. Consider two salamanders, and each of them has only one neuron, which is both a sensory neuron and motor neuron; say, its firing produces a particular movement. The salamanders are very similar, but their visual receptors are tuned to different wavelengths. In the first salamander, the neuron reacts to red stimuli; in the second salamander, the neuron reacts to blue stimuli. What might happen in terms of visual experience when the neuron fires? Does the first salamander see red and the other see blue? If we think that neural activity alone determines experience, then in fact the two salamanders should experience exactly the same thing – and this is also independent of the sensorimotor contingencies in this case.

 

4) What is the relationship between the structure of experience and the structure of neural activity?

Subjective experience is highly structured. There might be some dispute about how rich it actually is, but it is at least as rich as what you can describe with words. A striking fact about language is that the meaning of sentences is not only implied by the words but also by the relations between them, i.e., the syntax. For example, a visual scene is composed of objects with spatial relations between them, and with attributes (a red car in front of a small house). In fact, there must be more to it than syntax, there must also be semantics: if neural activity completely determines subjective experience, it must not only specify that there is a car, but also what a car is. A useful notion in psychology of perception is the concept of “affordance” introduced by James Gibson: the affordance of an object is what it allows you to do (e.g. a car affords driving). Affordances are potentialities of interaction, and they gives some meaning (rather than labels) to perceptual objects. This brings an inferential structure to experience (if I did that, this would happen).

This stands in sharp contrast with the central perceptual concept in neuroscience, the notion that “cell assemblies” represent particular percepts. A cell assembly is simply a set of neurons, and their co-activation represents a particular percept (say, a particular face). Let us say that one neuron represents “red”, another represents “car”, then the assembly of the two neurons represents the red car. The problem with this concept is that it is very poorly structured. It cannot represent relations between objects, for example. This type of representation is known as the “bag-of-words” model in language processing: a text is represented by its set of words, without any syntactic relationship; clearly, the meaning of the text is quite degraded. The concept of cell assembly is simply too unstructured to represent experience.

If we are looking for a mapping between neural activity and percepts, then 1) we must find a way to define some structure on neural activity, and 2) the mapping must preserve that structure (in mathematical terms, we are looking for a morphism, not a simple mapping).

I can summarize this discussion by pointing out that to make progress on the mind-body problem, there are two crucial steps: 1) to understand the articulation between physical time and the time of consciousness, 2) to understand the articulation between the structure of neural activity and the structure of phenomenal experience.

The phenomenal content of neural activity

This post is about the mind-body problem. Specifically, what is the relationship between the activity of the brain and the phenomenal content of conscious experience? It is generally thought that experience is somehow produced by the electrical activity of neurons. The caricatural example of this idea is the concept of the “grandmother cell”: a neuron lights up when you think of your grandmother, or conversely the activation of that neuron triggers the experience of, say, the vision of your grandmother's face. The less caricatural version is the concept of cell assemblies, where the single cell is replaced by a set of neurons. There are variations around this theme, but basically, the idea is that subjective experience is produced by the electrical activity of neurons. There actually is some experimental evidence for this idea, coming from the electrical stimulation of the brain of epileptic patients (read any book by Oliver Sacks). Electrical stimulation is used to locate the epileptic focus in those patients, and depending on where the electrode is in the brain, electrical stimulation can trigger various types of subjective experiences. Epileptic seizures themselves can produce such experiences, for example auditory experiences of hearing specific musics. Migraines can also trigger perceptual experiences (called “aura”), in particular visual hallucinations. So there is some support for the idea of a causal relationship between neural activity and subjective experience.

The obvious question, of course, is: why? At this moment, I have no idea why neural activity should produce any conscious experience at all. We do not believe that the activity of the stomach causes any subjective experience for the stomach, or the activity of any set of cells, including cardiac cells, which also have an electrical activity (but of course, maybe we are wrong to hold this belief).

I propose to start with a slighly more specific question: why does neural activity cause subjective experience of a particular quality? Any conscious experience is an experience of something (a property called intentionality in philosophy), for example the vision of your grandmother's face. Why is it that a particular spatio-temporal pattern of activity in a neural network produces, for that neural network, the experience of seeing a face? One type of answer is to say that this particular pattern has been associated with the actual visual stimulus of the face, ie, it “encodes” the face, and so the meaning of those neurons lighting up is the presence of that visual stimulus. This is essentially the “neural coding” perspective. But there is a big logical problem here. What if the visual stimulus is not present, but the neurons that “encode” the face light up either naturally (memory, dream) or by electrode stimulation? Why would that produce a visual experience rather than anything else? If experience is produced by neural activity alone, then it should not matter what external stimulus might cause those neurons to fire, or what happened in the past to those neurons, or even what world the neurons live in, but only which neurons fire now. Which neurons fire now should entirely determine, by itself, the content of subjective experience. Again the problem with the neural coding perspective is that it is essentially dualist: at some stage, there is some other undefined process that “reads the code” and produces subjective experience. The problem we face here is that the firing of neurons itself must intrinsically specify the experience of seeing a face, independent of the existence of an outside world.

I will try to be more specific, with a very simple example. Imagine there is just one neuron, and two stimuli in the world, A and B. Now suppose, by conditioning or even simply by anatomical assumption, that stimulus A makes the neuron fire. A neural coder would say: this neuron codes for stimulus A, and therefore this neuron's firing causes the experience of A. But you could also assume a different situation, maybe a different organism or the same organism conditioned in a different way, where stimulus B, and not A, makes the neuron fire. If neural activity is what causes subjective experience, then this neuron's firing should produce exactly the same experience in both cases, even though different stimuli cause them to fire. This example can be vastly generalized, and the implication is that any two patterns of neural activity that are identical up to a permutation of neurons should produce the same subjective experience for that set of neurons.

As if all this were not puzzling enough, I will now end on a disturbing experiment of thought. Imagine we measure the entire pattern of neural activity of someone experiencing the vision of his grandmother. Then we build a set of blinking red lights, one for each neuron, programmed so as to light up at the same time as the neurons did. The red lights don't even need to be connected to each other. The electrical activity of this set of lights is thus the same as the activity of the neural network. Therefore, by the postulate that electrical activity is what causes subjective experience, the set of lights should experience the sight of the grandmother, with the impression of being the grandson. Would it?

Modern panpsychism: about the integrated information theory of consciousness

In the last decade, a number of neuroscientists have become interested in the question of consciousness. For example Christof Koch, Stanilas Dehaene, Gerald Edelman, and many others. There have been a number of interesting new insights on this old subject, mostly focused on the so-called “neural correlates of consciousness”, that is, the properties of neural activity that are associated with conscious states, as opposed to say coma. However, to my mind there is no convincing theory that explains what is consciousness, why we are conscious at all and why we feel anything at all (phenomenal consciousness). But there have been attempts. A recent one is the integrated information theory (IIT) proposed by Tononi, which proposes that consciousness is a property of all systems that have a high level of “integrated information”. In a nutshell, such a system is a dynamical system that cannot be divided into smaller independent (or weakly dependent) systems. The term “information” should be understood in the sense of information theory: how much information (uncertainty reduction) there is in the state of a subsystem about the future of another subsystem. In a nutshell, the problem with this theory is that it is as much about consciousness as information theory is about information.

Christof Koch is a notorious fan of IIT. He describes the theory in a popular science article entitled “Ubiquitous minds” (available on his web page). I should point out that it is not an academic paper, so maybe my criticisms will seem unfair. So to be fair, let us say that what follows is a criticism of the arguments in that article, but perhaps not of Koch's thought in general (admittedly I have not read his book yet).

Koch correctly presents IIT as a modern form of panpsychism, that is, the idea that lots of things are conscious to some level. Animals, of course, but also any kind of system, living or not, that has high “integrated information” (named “phi”). On his blog, Scott Aaronson, a theoretical computer scientist, gives an example of a matrix multiplication system that has this property and therefore should be highly conscious according to IIT if it where physically implemented. Now Tononi and Koch do not see this counter-intuitive implication as a problem with the theory, but on the contrary they embrace it as a highly interesting implication. Koch speculates for example that the internet might be conscious.

Koch starts by describing the naïve version of panpsychism, which indeed can easily be dismissed. Naïve panpsychism states that everything is conscious, to different levels: a brain, a tree, a rock. This immediately raises a big problem (refered to as the “problem of aggregates” in the article): you might claim that everything is conscious, but then you need to define what a “thing” is. Is half a rock conscious? Then which half? Is any set of 1000 particles randomly chosen in the universe conscious? Is half of my brain plus half of your stomach a conscious entity?

IIT is more restricted than naïve panpsychism, but it suffers from the same problem: how do you define a “system”? Wouldn't a subsystem of a conscious system also be conscious, according to the theory? As Koch writes, the theory offers no intrinsic solution to this problem, it must be augmented by an ad hoc postulate (“that only “local maxima” of integrated information exist”). What puzzles me is that the paper ends on the claim that IIT offers an “elegant explanation for [the existence of] subjective experience”. What I have read here is an interesting theory of interdependence in systems, and then a claim that systems made of interdependent parts are conscious. Where is the explanation in that? A word (“consciousness”) was arbitrarily put onto this particular property of systems, but no hint was provided at any point about a connection between the meaning of that word and the property those systems. Why would this property produce consciousness? No explanation is given by the theory.

If it is not an explanation, then it must simply be a hypothesis; the hypothesis that systems with high integrated information are conscious. That is, it is a hypothesis about which systems are conscious and which are not. As we noted above, this hypothesis assigns consciousness to non-living things, possibly including the internet, and definitely including some rather stupid machines that no one would consider conscious. I would consider this a problem, but tenants of IIT would simply adopt panpsychism and consider that, counter-intuitively, those things are actually conscious. But then this means admitting that no observation whatsoever can give us any hint about what systems are conscious (contrary to the first pages of Koch's article, where he argues that animals are conscious on those grounds); in other words, that the hypothesis is metaphysical and not testable. So the hypothesis is either unscientific or wrong.

Now I am not saying that the theory is uninteresting. I simply think that it is a theory about consciousness and not of consciousness. What about is it exactly? Let us go back to what integrated information is supposed to mean. Essentially, high integrated information means that the system cannot be subdivided in two independent systems – the future state of system A depends on the current state of system B and conversely. This corresponds to an important property of consciousness: the unity of consciousness. You experience a single stream of consciousness that integrates sound, vision, etc. Sound and vision are not experienced by two separate minds but by a single one. Yet this is what should happen if there were two unconnected brain areas dealing with sound and light. Thus a necessary condition for a unique conscious experience is that the substrate of consciousness cannot be divided into causally independent subsets. This is an important requirement, and therefore I do think that the theory has interesting things to say about consciousness, in particular what its substrate is, but it explains nothing about why there is a conscious experience at all. It provides a necessary condition for consciousness – and that's already quite good for a theory about consciousness.

But that's it. It does not explain why an interdependent system should be conscious – and in fact, given some examples of such systems, it seems unlikely that it is the case. What is missing in the theory? I hinted at it in my introduction: the problem with integrated information theory is that it is as much about consciousness as information theory is about information. The word “information” in information theory has little to do with information in the common sense of the word, that is, something that carries meaning for the receiver. But information theory is actually better described as a theory of communication. In fact, one should remember that Shannon's seminal paper was entitled “A Mathematical Theory of Communication”, not of information. In a communication channel, A is encoded into B by a dictionary, and B carries information about A insofar as one can recover A from B. But of course it only makes sense for the person sitting at the receiving end of the communication channel if 1) she has the dictionary, 2) A already makes sense to her. “Information” theory says nothing about how A acquires any meaning at all, it is just about the communication of information. For this reason, “integrated information” fails to address another important aspect of consciousness, which in philosophy is named “intentionality”: the idea that one is always conscious of something, i.e. consciousness has a “content” - not just a “quantity of consciousness”. Any theory that is solely based on information in Shannon's sense (dictionary) cannot say much about phenomenonal consciousness (how it feels like).

For the end of this post, I will simply quote Scott Aaronson:
“But let me end on a positive note. In my opinion, the fact that Integrated Information Theory is wrong—demonstrably wrong, for reasons that go to its core—puts it in something like the top 2% of all mathematical theories of consciousness ever proposed. Almost all competing theories of consciousness, it seems to me, have been so vague, fluffy, and malleable that they can only aspire to wrongness.”

Perceiving and knowing

Perceiving space is knowing where things are in the world. Or is it?

I am sitting in my living room, and there are big windows on a courtyard. The windows are sound-proof and so if I open just one, acoustical waves mostly enter the room through that window. Now someone enters the courtyard on the right, walks across it and arrives at the door on the left. If I close my eyes, I know that the person is walking from right to left. However, what I hear is the sound of someone walking, always coming from the same direction, that of the window. If someone asks me where the person is at a given moment time, I could point to the more or less correct direction, by inference. But this is not what I perceive. I always perceive the sound coming from the same direction. There is a difference between perceiving (phenomenological) and knowing (conceptual). And there is a difference between phenomenology and behavior.

Another striking example is referred pain. Referred pain is a pain that one feels at a location away from the cause of the injury. For example, in a heart attack, one may feel pain in the arm rather than in the chest. This is a known phenomenon and if you know it, you may correctly identify the location of injury in the heart when you feel pain in the arm. But it doesn't change the fact that you feel pain in the arm. You may entirely convince yourself that the injury is in the heart, and all your behavior might be consistent with that belief, but still you will feel the pain in the arm.

There are several interesting conclusions we can draw from these remarks. First, perception is not entirely reducible to behavior. Here we are touching the hard problem of consciousness (qualia): you could observe a cat turning its head to a sound source and you would think that the cat perceives that the sound came from the source, but in reality you don't know. Maybe the cat perceives it somewhere else but it corrects its movement because it knows its perception tends to be biased. With humans, you could perhaps distinguish between these possibilities because humans speak. But without this option, a purely functionalist approach to perception (in terms of relationships between sensory stimuli and behavior) misses part of the phenomenon.

Second, inference is not the same as perception. Spatial perception is not just the process of inferring where something is from sensory inputs. There is also the experience of perception, which is not captured by the objectivist view.

What is sound? (VII) The phenomenology of pitch

So far, I have focused on an ecological description of sound, that is, how sounds appear from the perspective of an organism in its environment: the structure of sound waves captured by the ears in relationship with the sound-producing object, and the structure of interaction with sounds. There is nothing psychological per se in this description. It only specifies what is available to our perception, in a way that does not presuppose knowledge about the world. I now want to describe subjective experience of sounds in the same way, without preconceptions about what it might be. Such a preconception could be, for example, to say: pitch is the perceptual correlate of the periodicity of a sound wave. I am not saying that this is wrong, but I want to describe the experience of pitch as it appears subjectively to us, independently of what we may think it relates to.

This is in fact the approach of phenomenology. Phenomenology is a branch of philosophy that describes how things are given to consciousness, our subjective experience. It was introduced by Edmund Husserl and developed by a number of philosophers, including Merleau-Ponty and Sartre. The method of “phenomenological reduction” consists in suspending all beliefs we may have on the nature of things, to describe only how they appear to consciousness.

Here I will briefly discuss the phenomenology of pitch, which is the percept associated to how high or low a musical note is. A vowel also produces a similar experience. First of all, a pure tone feels like a constant sound, unlike a tone modulated at low frequency (say, a few Hz). This simple remark is already quite surprising. A pure tone is not a constant acoustical wave at all, it oscillates at a fast rate. Yet we feel it as a constant sound, as if nothing were changing at all in the sound. At the same time, we are not insensitive to this rate of change of the acoustical wave: if we vary the frequency of the pure tone, it feels very differently. This feeling is what is commonly associated to pitch: when the frequency is increased, the tone feels “higher”, when it is decreased, it feels “lower”. Interestingly, the language we use to describe pitch is that of space. I am too influenced by my own language and my musical background to tell whether we actually feel high sounds as being physically high, but it is an interesting observation. But for sure, low pitch sounds tend to feel larger than high pitch sounds, again a spatial dimension.

A very distinct property of pitch is that changing the frequency of the tone, i.e., the temporal structure of the sound wave, does not produce a perceptual change along a temporal dimension. Pitch is not temporal in the sense of: there is one thing, and then there is another thing. With a pure tone, there always seems to be a single thing, not a succession of things. In contrast, with an amplitude-modulated tone, one can feel that the sound is sequentially (but continuously) louder and weaker. In the same way, if one hits a piano key, the loudness of the sound decreases. In both cases there is a distinct feel of time associated to the change in amplitude of the sound wave. And this feel does not exist with the fast amplitude change of a tone. This simple observation demonstrates that phenomenological time is distinct from physical time.

Another very salient point is that when the loudness of the sound of the piano key decreases, the pitch does not seem to change. Somehow the pitch seems to be invariant to this change. I would qualify this statement, however, because this might not be true at low levels.

When a tone is accelerated, the sound seems to go higher, as when one asks a question. When it is decelerated, it seems to go lower, as when one ends a sentence. Here there is a feeling of time (first it is lower, then it is higher), corresponding to the temporal structure of the frequency change at a fast timescale.

Now when one compares two different sounds from the same instrument in sequence, there is usually a distinct feeling of one sound being higher than the other one. However, when the two sounds are very close in pitch, for example when one tunes a guitar, it can be difficult to tell which one is higher, even though it may be clearer that they have distinct pitches. When one plays two notes of different instruments, it is generally easy to tell whether it is the same note, but not always which one is higher. In fact the confusion is related to the octave similarity: if two notes are played on a piano, differing by an octave (which corresponds to doubling the frequency), they sound very similar. If they are played together instead of sequentially, they seem to fuse, almost as a single note. It follows that pitch seems to have a somewhat circular or helicoidal topology: there is an ordering from low to high, but at the same time pitches of notes differing by an octave feel very similar.

If one plays a melody on one instrument and then the same melody on another instrument, they feel like the same melody, even the though the acoustic waves are very different, and certainly they sound different. If one plays a piano key, then it is generally easy to immediately sing the same note. Of course when we say “the same note”, it is actually a very different acoustical wave that is produced by our voice, but yet it feels like it is the same level of “highness”. These observations certainly support the theory that pitch is the perceptual correlate of the periodicity of the sound wave, with the qualification that low repetition rates (e.g. 1 Hz) actually produce a feel of temporal structure (change in loudness or repeated sounds, depending on what is repeated in the acoustical wave) rather than a lower pitch.

The last observation is intriguing. We can repeat the pitch of a piano key with our voice, and yet most of us do not possess absolute pitch, the ability to name the piano key, even with musical training. It is intriguing because the muscular commands to the vocal system required to produce a given note are absolute, in the sense that they do not depend on musical context. This means, for most of us who do not possess absolute pitch, that these commands are not available to our consciousness as such. We can sing a note that we just heard, but we cannot sing a C. This suggests that we actually possess absolute pitch at a subconscious level.

I will come back to this point. Before, we need to discuss relative pitch. What is meant by “relative pitch”? Essentially, it is the observation that two melodies played in different keys sound the same. This is not a trivial fact at all. Playing a melody in a different key means scaling the frequency of all notes by the same factor, or equivalently, playing the fine structure of the melody at a different rate. The resulting sound wave is not at all like the original sound wave, either in the temporal domain (at any given time the acoustical pressures are completely different) or in the frequency domain (spectra could be non-overlapping). The melody sounds the same when fundamental frequencies are multiplied by the same factor, not when they are shifted by the same quantity. Note also that the melody is still recognizable when the duration of notes or gaps is changed, when the tempo is different, when expressivity is changed (e.g. loudness of notes) or when the melody is played staccato. This fact questions neurophysiological explanations based on adaptation.

Thus, it seems that, at a conscious level, what is perceived is primarily musical intervals. But even this description is probably not entirely right. It suggests that the pitch of a note is compared to the previous one to make sense. But if one hears the national hymn with a note removed, it will not feel like a different melody, but like the same melody with an ellipse. It is thus more accurate to say that a note makes sense within a harmonic context, rather than with respect to the previous note.

This point is in fact familiar to musicians. If a song is played and then one asks to sing another song, then the singer will tend to start the melody in the same key as the previous song. The two songs are unrelated, so thinking in terms of intervals does not make sense. But somehow there seems to be a harmonic context in which notes are interpreted.

Now the fact that there is such an effect of the previous song means that the harmonic context is maintained in working memory. It does not seem to require any conscious effort or attention, as when one tries to remember a phone number. Somehow it stays there, unconsciously, and determines the way in which future sounds are experienced. It does not even appear clearly whether there is a harmonic context in memory or if it has been “forgotten”.

Melodies can also be remembered for a long time. A striking observation is that it is impossible for most people to recall a known melody in the right key, the key in which it was originally played, and it is also impossible to tell whether the melody, played by someone else, is played in the right key. Somehow the original key is not memorized. Thus it seems that it is not the fundamental frequency of notes that is memorized. One could imagine that intervals are memorized rather than notes, but as I noted earlier, this is probably not right either. More plausible is the notion that it is the pitch of notes relative to the harmonic structure that is stored (i.e., pitch is relative to the key, not to the previous note).

We arrive at the notion that both the perception and the memory of pitch is relative, and it seems to be relative in a harmonic sense, i.e., relative to the key and not in the sense of intervals of successive notes. Now what I find very puzzling is that the fact that we can even sing means that, at a subconscious level but not at a conscious level, we must have a notion of absolute pitch.

Another intriguing point is that we can imagine a note, play it in our head, and then try to play it on a piano: it may sound like the note we played, or it may sound too high or too low. We are thus able to make a comparison between a note that is physically played and a note that we consciously imagine. But we are apparently not conscious of pitch in an absolute sense, in a way that relates directly to properties of physical sounds. The only way I can see to resolve this apparent contradiction is to say that we imagine notes as degrees in a harmonic context (or musical scale), i.e., “tonic” for the C note in a C key, “dominant” for the G note in a C key, etc, and in the same way we perceive notes as degrees. The absolute pitch, independent of the musical key, is also present but at a subconscious level.

I have only addressed a small portion of the phenomenology of pitch, since I have barely discussed harmony. But clearly, it appears that the phenomenology of pitch is very rich, and also not tied to the physics of sound in a straightforward way. It is deeply connected with the concepts of memory and time.

In light of these observations, it appears that current theories of pitch address very little of the phenomenology of pitch. In fact, all of them (both temporal and spectral theories) address the question of absolute pitch, something that most of us actually do not have conscious access to. It is even more limited than that: current models of pitch are meant to explain how the fundamental frequency of a sound can be estimated by the nervous system. Thus, they start from the physicalist postulate that pitch is the perceptual correlate of sound periodicity, which, as we have seen, is not unreasonable but remains a very superficial aspect of the phenomenology of pitch. They also focus on the problem of inference (how to estimate pitch) and not on the deeper problem of definition (what is pitch, why do some sounds produce pitch and not others, etc.).

Affordances, subsumption, evolution and consciousness

James Gibson defended the idea that what we perceive of the environment is affordances, that is, the possibilities of interactions they allow. For example, a knob affords twisting, or the ground affords support. The concept of affordance makes a lot of sense, but Gibson also insisted that we directly perceive these affordances. It has never been very clear to me what he meant by that. But following recent discussions, I have thought of a way in which this statement might make sense - although I have no idea whether this is what Gibson meant.

The way sensory systems work is traditionally defined as an early extraction of “features”, like edges, which are then combined through a hierarchical architecture into more and more complex things, until one gets “an object”. In this view, affordances are obtained at the end of the chain, and so it is not direct at all. In robotics, another kind of architecture was proposed by Rodney Brooks in the 1980s, the “subsumption architecture”. It was meant as a way to make robots in an incremental way, by progressively adding layers of complexities. In his example, the first layer of the robot would be a simple control system by which external motor commands produce movement, and there is a sonar that computes a repulsive force in case there is a wall in front of it, and the force is sent to the motor module. Then there is a second layer that makes the robot wander. It basically randomly chooses a direction at regular intervals, and it combines it with the force computed by the sonar in the first layer. The second layer is said to “subsume” the first one, i.e., it takes over. Then there is another level on top of it. The idea in this architecture is that the set of levels below any level is functional, it can do something on its own. This is quite different from standard hierarchical sensory systems, in which the only purpose of each level is to send information to the next level. Here we get to Gibson’s affordances: if the most elementary level must be functional, then what it senses is not simple features, but rather simple affordances, simple ways to interact with its environment. So in this view, what is elementary in perception is affordances, rather than elementary sensations.

I think it makes a lot of sense from an evolutionary point of view that sensory systems should look like subsumption architectures rather than standard hierarchical perception systems. If each new structure (say, the cortex) is added on top of an existing set of structures, then the old set of structures has a function by itself, independently of the new structure. Somehow the old set is “subsumed” by the new structure, and the information this new structure gets must then already have a functional meaning. This would mean that affordances are the basis, and not the end of result, of the sensory system. In this sense, perhaps, one might say that affordances are “directly” perceived.

When thinking about what it means for consciousness, I like to refer to the man and horse analogy. The horse is perfectly functional by itself. It can run, it can see, etc. Now the man on top of it can “subsume” the horse. He sends commands to it so as to move it where he wants, and also gets signals from the horse. The man is conscious, but he has no idea of what the horse feels, for example how the horse feels the ground. All the sensations that underlie the man-horse’s ability to move around are inaccessible to the conscious man, but it is not a problem at all for the man to go where he wants to.

Now imagine that the man is blind. If there is an obstacle in front of the horse, the horse might stop, perhaps get nervous, things that the man can feel. The man cannot feel the wall in terms of “raw sensations”, but he can perceive that there is something that blocks the way. In other words, he can perceive the affordance of the wall – something that affords blocking, without seeing the wall.

So in this sense, it does not seem crazy anymore that what we directly perceive (we = our conscious self) is made of affordances rather than raw sensations.