What is computational neuroscience? (XIV) Analysis and synthesis

I would like to propose another way to describe the epistemological relationships between computational and experimental neuroscience. In acoustics, there is a methodology known as “analysis-synthesis” of sounds (Risset & Wessel, 1982) to understand what makes the quality (or “timbre”) of a sound (see in particular Gaver (1993), “How do we hear in the world?”). A first step is to examine the sound by various methods, for example acoustic analysis (look at the spectrum, the temporal envelope, etc), and try to extract the salient features. A second step consists in synthesizing sounds that display these features. One then listen to these sounds to evaluate whether they successfully reproduce the quality of the original sounds. This evaluation step can be made objective with psychoacoustic experiments. The results of the synthesis step then inform the analysis, which can then focus on those aspects that were not correctly captured, and the procedure goes through a new iteration. The analysis can also be guided by physical analysis, i.e., by theory. For example, the perceived size of a sounding object should be related to the resonant frequencies, whose wavelengths correspond to the dimensions of the object. The type of material (wood, metal) should be related to the decay rate of the temporal envelope. By these principles, it is possible to synthesize convincing sounds of impacts on a wood plate, for example.

There is a direct analogy with the relationship between computational and experimental neuroscience. Experimental neuroscience aims at identifying various aspects of the nervous system that seem significant: this is the analysis step. The object of experiments is a fully functional organism, or a piece of it. The empirical findings are considered significant in relationship with the theory of the moment (perhaps in analogy with physical analysis in acoustics), and with the chosen method of analysis (type of measurement and experimental protocol). By themselves, they only indicate what might contribute to the function of the organism, and more importantly how it contributes to it. For example, if the attack of a piano sound is removed, it doesn’t sound like a piano anymore, so the attack is important to the quality of the piano sound. In the same way, lesion studies inform us of what parts of the brain are critical for a given function, but this doesn’t tell us how exactly those parts contribute to the function. Computational neuroscience, then, can be viewed as the synthesis step. Starting from nothing (i.e., not a fully functional organism), one tries to build a drastically simplified system, informed by the analysis step. But the goal is not to reproduce all the pieces of empirical data that were used to inform the system. The goal is to reproduce the function of the organism. In analogy with sound: the goal is not to reproduce detailed aspects of the spectrum, but rather that the synthesized signal sounds good. If the function is not correctly reproduced, then maybe the features identified by the analysis step were not the most relevant ones. In this way the synthesis step informs the analysis step.

This analogy highlights a few important epistemological specificities of computational neuroscience. Most importantly, computational neuroscience is primarily about explaining the function, and only secondarily the empirical data. Empirical experiments on the auditory system of the barn owl aim at explaining how the barn owl catches a mouse in the dark. Computational studies also aim at explaining how the barn owl catches a mouse in the dark, not at reproducing the results of the empirical experiments. Another way to put it: the data to be explained by the theory are not only what is explicitly stated in the Results section, but also the other empirical piece of evidence that is implicitly stated in the Methods or the Introduction section, that is, that before the experiment, the barn owl was a fully functional living organism able to catch a prey in the dark. Secondly, computational neuroscience, as a synthetic approach, aims at a simple, conceptually meaningful, description of the system. Realism is in the function (how the signal sounds), not in the amount of decoration aimed at mimicking pieces of empirical data.

This discussion also brings support to the criticism of epistemic reductionism. Imagine we can measure all the components of the brain, and put them together in a realistic simulation of the brain (which already implies some form of methodological reductionism). This would correspond to fully analyzing the spectrum of a sound, recording it in complete details, and then playing it back. What is learned about what makes the quality of the sound? A second point is methodological: suppose we collect all necessary data about the brain, but from different individual brains, and perhaps a bunch of related species like mice. Would the result sound like a piano, or would it sound like a cacophony of different pianos and a violin?

What is computational neuroscience? (XIII) Making new theories

Almost all work in philosophy of science concerns the question of how a scientific theory is validated, by confronting it with empirical evidence. The converse, how a theory is formulated in the first place, is considered as a mysterious process that concerns the field of psychology. As a result of this focus, one might be led to think that the essence of scientific activity is the confrontation of theories with empirical facts. This point stands out in the structure of biology articles, which generally consist of a short introduction, where the hypothesis is formulated, the methods, where the experiments are described, the results, where the outcome of the experiments is described, and the discussion, where the hypothesis is evaluated in regard of the experimental results. The making of theory generally makes a negligible part of the articles.

Let us consider the problem from a logical point of view. At a given point of time, there is only a finite set of empirical elements that can be taken into account to formulate a theory. A theory, on the other hand, consists of universal statements that apply to an infinite number of predictions. Because the empirical basis to formulate a theory is finite, there are always an infinite number of possible theories that can be formulated. Therefore, from a purely logical point of view, it appears that the making of a theory is an arbitrary process. Imagine for example the following situation. One is presented with the first two observations of an infinite sequence of numbers: 2, 4 and 6. One theory could be: this is the sequence of even numbers, and the empirical prediction is that the next number is 8. Another theory would be: this is the beginning of a Fibonacci sequence, and so the next number should be 10. But it might also be that the next number is 7 or any other number. So no theory is a logical consequence of observations.

If what is meant by “scientific” is a process that is purely based on empirical evidence, then we must recognize that the making of a theory is a process that is not entirely scientific. This process is constrained by the empirical basis, and possibly by Popper’s falsifiability criterion (that the theory could be falsified by future experiments), but it leaves a considerable amount of possibilities. Whether a theory is “good” or “bad” can be partly judged by its consistence with the empirical evidence at the time when it is made, but mostly the empirical evaluation of a theory is posterior to its formulation. Thus, at the time when a theory is formulated, it may be considered interesting, i.e., worth investigating, rather than plausible. Therefore the choice of formulating one theory rather than another is determined by non-empirical criteria such as: the elegance and simplicity of the theory; its generality (whether it only accounts for current empirical evidence or also makes many new predictions); its similarity with other fruitful theories in other fields; its consonance with convincing philosophical point of views; the fact that it may generalize over preexisting theories; the fact that it suggests new experiments that were not thought of before; the fact that it suggests connections between previously distinct theories.

Thus, theoretical activity reaches far beyond what is usually implicitly considered as scientific, i.e., the relationship with empirical evidence. Yet there is no science without theories.

What is computational neuroscience? (XII) Why do scientists disagree?

A striking fact about the making of science is that in any field of research, there are considerable disagreements between scientists. This is an interesting observation, because it contradicts the naive view of science as a progressive accumulation of knowledge. Indeed, if science worked in this way, then any disagreement should concern empirical data only (e.g. whether the measurements are correct). On the contrary, disagreements often concern the interpretation of data rather than the data themselves. The interpretative framework is provided by a scientific theory, and there are often several of them in any field of research. Another type of disagreement concerns the judgment of how convincingly some specific piece of data demonstrates a particular claim.

There are two possibilities: either a large proportion of scientists are bad scientists, who do not correctly apply sound scientific methodology, or the adhesion to a theory and the judgment of particular claims are not entirely based on scientific principles. The difficulty with the first claim, of course, is that there is no systematic and objective criterion to judge what “good science” is and what “bad science” is. In fact, the very nature of this question is epistemological: how is knowledge acquired and how do we distinguish between different scientific theories? Thus, part of the disagreement between scientists is not scientific but epistemological. Epistemological questions are in fact at the core of scientific activity, and failure to recognize this point leads to the belief that there is a single way to do science, and therefore to dogmatism.

So why do scientists favor one theory rather than the other, given the same body of empirical data? Since the choice is not purely empirical, it must rely on other factors that are not entirely scientific. I would argue that a major determinant of the adhesion to a particular theory, at least in neuroscience, is the consonance with philosophical conceptions that the scientist holds. These conceptions may not be recognized as such, because many scientists have limited knowledge or interest in philosophy. One such conception would be, for example, that the objects of perception exist independently of the organism and that the function of a perceptual system is to represent them. Such a conception provides a framework in which empirical data are collected and interpreted, and therefore it is not generally part of the theoretical claims that are questioned by data. It is a point of view rather than a scientific statement, but it guides our scientific enquiry. Once we realize that we are in fact guided by philosophical conceptions, we can then start questioning these conceptions. For example, why would the organism need to represent the external world if the world is already there to be seen? Shouldn’t a perceptual system rather provide ways to act in the world rather than represent it? Who reads the “representation” of the world? Given that the world can only be accessed through the senses, how can this representation be interpreted in terms of the external world?

Many scientists deny that philosophy is relevant for their work, because they consider that only science can answer scientific questions. However, given that the adhesion of a scientist to a particular scientific theory (and therefore also the making of a scientific theory) is in fact guided by philosophical preconceptions, rejecting philosophy only has the result that the scientist may be guided by naive philosophical conceptions.

Finally, another determinant of the adhesion to a particular scientific theory is psychological and linked to the personal history of the scientist. The theory of cognitive dissonance, perhaps the most influential theory in psychology, claims that human psychology is determined by the drive to minimize the dissonance between different cognitive elements. For example, when a piece of evidence is presented that contradicts the beliefs of the scientist, this produces cognitive dissonance and a drive to reduce it. There are different ways to reduce it. One is that the scientist changes her mind and adopts another theory that is consistent with the new piece of data. Another one is that the piece of data is rejected or interpreted in a way that is consonant with the beliefs of the scientist, possibly by adding an ad hoc hypothesis. Another one is to add consonant elements, e.g. by providing new pieces of evidence that support the beliefs of the scientist. Another one is to seek consonant information and to avoid dissonant information (e.g. only read those papers that are most likely to support the beliefs of the scientist). The theory of cognitive dissonance predicts that the first way rarely occurs. Indeed, as the scientist develops his carrier within a given scientific theory, she develops more and more ways to discard dissonant pieces of information, seeks information that is consonant with the theory and by taking all these decisions, many of them public, increases the dissonance between her behavior and contradictory elements. An important and counter-intuitive prediction of the theory of cognitive dissonance is that contradictory evidence generally reinforces the beliefs of the scientist that is deeply committed to a particular theory.

In summary, a large part of scientific activity, including the making of and the adhesion to a scientific theory, relies on epistemological, philosophical and psychological elements.

What is computational neuroscience? (XI) Reductionism

Computational neuroscience is a field that seeks a mechanistic understanding of cognition. It has the ambition to explain how cognition arises from the interaction of neurons, to the point that if the rules that govern the brain are understood in sufficient detail, it should be in principle possible to simulate them on a computer. Therefore, the field of computational neuroscience is intrinsically reductionist: it is assumed that the whole (how the brain works) can be reduced to final elements that compose it.

To be more precise, this view refers to ontological reductionism. A non ontologically reductionist view would be for example vitalism, the idea that life is due to the existence of a vital force, without which any given set of molecules would not live. A similar view is that the mind comes from a non-material soul, which is not scientifically accessible, or at least not describable in terms of the interaction of material elements. One could also imagine that the mind arises from matter, but that there is no final intelligible element – e.g. neurons are as complex as the whole mind, and smaller elements are not more intelligible.
In modern science in general and in neuroscience in particular, ontological reductionism is fairly consensual. Computational neuroscience relies on this assumption. This is why criticisms of reductionism are sometimes wrongly perceived as if they were criticisms of the entire scientific enterprise. This perception is wrong because criticisms of reductionism are generally not about ontological reductionism but about other forms of reductionism, which are more questionable and controversial.

Methodological reductionism is the idea that the right way, or the only way, to understand the whole is to understand the elements that compose it. It is then assumed that the understanding of the whole (e.g. function) derives from this atomistic knowledge. For example, one would consider that the problem of memory is best addressed by understanding the mechanics of synaptic plasticity – e.g. how the activity of neurons changes the synapses between them. In genetics, one may consider that memory is best addressed by understanding which genes are responsible for memory, and how they control the production of proteins involved in the process. This is an assumption that is less consensual, in computational neuroscience or in science in general, including in physics. Historically, it is certainly not true that scientific enquiry in physics started from understanding microscopic laws before macroscopic laws. Classical mechanics came before quantum mechanics. In addition, macroscopic principles (such as thermodynamics and energy in general) and symmetry principles are also widely used in physics in place of microscopic laws (for example, to understand why soap makes spherical bubbles). However, this is a relatively weak criticism, as it can be conceived that macroscopic principles are derived from microscopic laws, even if this does not reflect the history of physics.

In life sciences, there are specific reasons to criticize methodological reductionism. The most common criticism in computational neuroscience is that, while function derives from the interaction of neurons, it can also be said that the way neurons interact together is indirectly determined by function, since living organisms are adapted to their environment through evolution. Therefore, unlike objects of physics, living beings are characterized by a circular rather than causal relationship between microscopic and macroscopic laws. This view underlies “principle-based” or “top-down” approaches in computational neuroscience. Note that this is a criticism of methodological reductionism, but not of ontological reductionism.

There is also a deeper criticism of methodological reductionism, following the theme of circularity. It stems from the view that the organization of life is circular. It has been developed by Humberto Maturana and Francisco Varela under the name “autopoiesis”, and by Robert Rosen under the name “M-R systems” (M for metabolism and R for repair). What defines an entity as living, before the fact that it may be able to reproduce itself, is the fact that it is able to live. It is such an obvious truth about life that it is easy to forget, but to maintain its existence as an energy-consuming organism is not trivial at all. Therefore, a living entity is viewed as a set of physical processes in interaction with the environment that are organized in such a way that they maintain their own existence. It follows that, while a part of a rock is a smaller rock, a part of a living being is generally not a living being. Each component of the living entity exists in relationship with the organization that defines the entity as living. For this reason, the organism cannot be fully understood by examining each element of its structure in isolation. This is so because the relationship between structure and organization is not causal but circular, while methodological reductionism assumes a causal relationship between the elements of structure and higher-order constructs (“function”). This criticism is deep, because it does not only claim that the whole cannot be understood by only looking at the parts, but also that the parts themselves cannot be fully understood without understanding the whole. That is, to understand what a neuron does, one must understand in what way it contributes to the organization of the brain (or more generally of the living entity).

Finally, there is another type of criticism of reductionism that has been formulated against attempts to simulate the brain. The criticism is that, even if we did manage to successfully simulate the entire brain, this would not imply that we would understand it. In other words, to reproduce is not to understand. Indeed we can clone an animal, and this fact alone does not give us a deep understanding of the biology of that animal. It could be opposed that the cloned animal is never exactly the same animal, but certainly the same could be said about the simulated brain. But tenants of the view that simulating a brain would necessarily imply understanding the brain may rather mean that such a simulation requires a detailed knowledge of the entire structure of the brain (ionic channels in neurons, connections between neurons, etc) and that by having this detailed knowledge about everything that is in the brain, we would necessarily understand the brain. This form of reductionism is called epistemic reductionism. It is somehow the reciprocal of ontological reductionism. According to ontological reductionism, if you claim to have a full mechanistic understanding of the brain, then you should be able to simulate it (providing adequate resources). Epistemic reductionism claims that this is not only a necessary condition but also a sufficient condition: if you are able to simulate the brain, then you fully understand it. This is a much stronger form of reductionism.

Criticisms of reductionism can be summarized by their answers to the question: “Can we (in principle, one day) simulate the brain?”. Critics of ontological reductionism would answer negatively, arguing that there is something critical (e.g., the soul) that cannot be simulated. Critics of epistemic reductionism would answer: yes, but this would not necessarily help us understanding the brain. Critics of methodological reductionism would answer: yes, and it would probably require a global understanding of the brain, but it could only be achieved by examining the organism as a system with an organization rather than as a set of independent elements in interaction.

What is sound? (XIV) Are there unnatural sounds?

In a previous post, I argued that some artificial sounds might be wrongly presented as if they were not natural, because ecological environments are complex and so natural sounds are diverse. But what if they were actually not natural? Perhaps these particular sounds can be encountered in a natural environment, but there might be other sounds that can be synthesized and heard but that are never encountered in nature.

Why exactly do we care about this question? If we are interested in knowing whether these sounds exist in nature, it is because we hypothesize that they acquire a particular meaning that is related to the context in which they appear (e.g. a binaural sound with a large ITD is produced by a source located on the ipsilateral leading side). This is a form of objectivism: it is argued that if we subjectively lateralize a binaural sound with a 10 ms ITD to the right, it is because in nature, such a sound would actually be produced by a source located on the right. So in fact, what we are interested in is not only whether these sounds exist in nature, but also additionally whether we have encountered them in a meaningful situation.

So have we previously encountered all the sounds that we subjectively localize? Certainly this cannot be literally true, for a new sound (e.g. in a new acoustical environment) could then never be localized. Therefore there must be some level of extrapolation in our perception. It cannot be that what we perceive is a direct reflection of the world. In fact, there is a relationship between this point and the question of inductivism in philosophy of science. Inductivism is the position that a scientific theory can be deduced from the facts. But this cannot be true, for a scientific theory is a universal statement about the world, and no finite set of observations can imply a universal statement. No scientific theory is ever “true”: rather, it agrees with a large body of data collected so far, and it is understood that any theory is bound to be amended or changed for a new theory at some point. The same can be said about perception, for example sound localization: given a number of past observations, a perceptual theory can be formed that relates some acoustical properties and the spatial location of the source. This implies that there should be sounds that have never been heard but that can still be associated with a specific source location.

Now we reach an interesting point, because it means that there may be a relationship between phenomenology and biology. When sounds are presented that deviate from the set of natural sounds, their perceived quality says something about the perceptual theory that the individual has developed. This provides some credit to the idea that the fact we lateralize binaural sounds with large ITDs might say something about the way the auditory system processes binaural sounds – but of course this is probably not the best example since it may well be in agreement with an objectivist viewpoint.

What is sound? (XIII) Loudness constancy

Perhaps the biggest puzzle in loudness perception is why a pure tone, or a stationary sound such as a noise burst, feels like it has constant loudness. Or more generally: why does a pure tone feel like it is a constant sound? (both in loudness and other qualities like pitch)

The question is not obvious because physically, the acoustical wave changes all the time. Even though we are sensitive to this change in the temporal fine structure of the wave, because for example it contributes to our perception of pitch, we do not hear it as a change: we do not hear the amplitude rising and falling. Only the envelope remains constant, and this is an abstract property of the acoustical wave. We could have chosen another property. For example, in models of the auditory periphery, it is customary to represent the envelope as a low-pass filtered version of the rectified signal. But this does not produce an exactly constant signal for pure tones.

Secondly, at the physiological level nothing is constant either for pure tones. The basilar membrane follows the temporal fine structure of the acoustical wave. The auditory nerve fibers fire at several hundred Hz. At low frequency they fire at specific phases of the tone. At higher frequency their firing seems more random. In both cases we hear a pure tone with a constant loudness. What is more, fibers adapt: they fire more at the onset of a tone, then their firing rate decreases with time. Yet we do not hear the loudness decreasing. On the other hand, when we strike a piano key, the level (envelope) of the acoustical wave decreases and we can hear this very distinctly. In both cases (pure tone and piano key) the firing rate of fibers decreases, but in one case we hear a constant loudness and in the other case a decreasing loudness.

Finally, it is not just that some high-level property of sound feels constant, but with a pure tone we are simply unable to hear any variation in the sound at all, whether in loudness or in any other quality.

This discussion raises the question: what does it mean that something changes perceptually? To (tentatively) answer this question, I will start with pitch constancy. A pure tone feels like it has a constant pitch. If its frequency is progressively increased, then we feel that the pitch increases. If the frequency remains constant, then the pure tone feels like a completely constant percept. We do not feel the acoustical pressure going up and down. Why? The pure tone has this characteristic property that from the observation of a few periods of the wave, it is possible to predict the entire future wave. Pitch is indeed associated with the periodicity of the sound wave. If the basis of what we perceive as pitch if this periodicity relationship, then as the acoustical wave unfolds, this relationship (or law) remains constantly valid and so the perceived pitch should remain constant. There is some variation in the acoustical pressure, but not in the law that the signal follows. So there is in fact some constancy, but at the level of the relationships or laws that the signal follows. I would propose that the pure tone feels constant because the signal never deviates from the perceptual expectation.

This hypothesis about perceptual constancy implies several non-trivial facts: 1) how sensory signals are presented to the system (in the form of spike trains or acoustical signals) is largely irrelevant, if these specific aspects of presentation (or “projection”) can be included in the expectation; 2) signal variations are not perceived as variations if they are expected; 3) signal variations are not perceived if there is no expectation. This last point deserves further explanation. To perceive a change, an expectation must be formed prior to this change, and then violated: the variation must be surprising, and surprise is defined by the relation between the expectation (which can be precise or broad) and the variation. So if there is no expectation (expectation is broad), then we cannot perceive variation.

From this hypothesis it follows that a completely predictable acoustical wave such as a pure tone should produce a constant percept. Let us come back to the initial problem, loudness constancy, and consider that the firing rate of auditory nerve fibers adapt. For a tone of constant intensity, the firing rate decays at some speed. For tones of increasing intensity, the firing rate might decay at slower speed, or even increase. For tones of decreasing intensity, the firing rate would decay faster. How is it that constant loudness corresponds to the specific speed of decay that is obtained for the tone of constant intensity, if the auditory system never has direct access to the acoustical signals?

Loudness constancy seems more difficult to explain than pitch constancy. I will start with the ecological viewpoint. In an ecological environment, many natural sounds are transient (e.g. impacts) and therefore do not have constant intensity. However, even though the intensity of an impact sound decays, its perceived loudness may not decay, i.e., it may be perceived as a single timed sound (e.g. a footstep). There are also natural sounds that are stationary and therefore have constant intensity, at least at a large enough timescale: a river, the wind. However, these sounds do not address the problem of neural adaptation, as adaptation only applies to sounds with a sharp onset. Finally, vocalizations have a sharp onset and slowly varying intensity (although this might be questionable). Thus, for a vocalization, the expected intensity profile is constant, and therefore it could be speculated that this explains the relationship between constant loudness and constant intensity, despite variations at the neurophysiological level.

A second line of explanation is related to the view of loudness as a perceptual correlate of intelligibility. A pure tone presented in a stationary background has constant intelligibility (or signal-to-noise ratio), and this fact is independent of any further (non-destructive) processing applied to the acoustical wave. Therefore, the fact that loudness is constant for a pure tone is consistent with the view that loudness primarily reflects the intelligibility of sounds.

What is sound? (XII) Unnatural binaural sounds

Some types of artificial sounds presented through headphones are sometimes described as not natural, in the sense that they have binaural relationships that sounds in a natural environment do not have. In general, this qualification refers to the qualities of point sources in an anechoic environment, but real environments reflect sounds and there are also more complex sound sources. I will discuss two types of “unnatural” binaural sounds.

1) Binaural noise with long interaural delays (ITD). In an anechoic environment, the ITD of a sound can reach 600-700 µs in high frequency for humans, and perhaps up to 800-900 µs in low frequency. Yet if we listen to binaurally delayed noise through headphones with an ITD of about 10 ms, we hear a single source, lateralized to one side. When the ITD is increased, starting from 0 µs, perceived lateralization progressively increases up to about 1 ms or a bit less, then reaches a plateau. We hear two separate noises only when the ITD is larger than about 10 ms. This is surprising because 10 ms is much larger than the maximal ITD that can be produced by a single sound source in an anechoic environment. However, let us consider a situation where there is an acoustically reflecting surface, a vertical wall, which is on our left side, about two meters away. A sound source is far on the opposite side. In this case, the right ear receives the direct wave from the source and the left ear receives the reflected wave. It follows that the ITD is about 10 ms. In addition, the direction of the sound source is consistent with the headphone experiments. Therefore, large ITDs may not be unlikely in natural environments, even with a simple point sound source.

2) Uncorrelated binaural noise. If acoustical noise is made of the addition of independent many sound sources, one would expect that the signals at the two ears are correlated in low frequency, that is, when the period is large compared to the maximum ITD of the sound sources. Are there sounds in an ecological environment that are binaurally uncorrelated? I would suggest the following situation. You are riding a bicycle, and you feel the wind in your face and in your ears. The sound of the wind in the ears does not feel localized anywhere else than at the ears. There is also little reason to believe that the pressure of the air is highly correlated at the two ears, except perhaps at very low frequency - although this should be measured. In fact, this acoustical situation correlates with mechanical pressure on the ears, which can be captured by tactile receptors. I would suggest that this type of sound is perceptually localized at the ears because of this ecological association.

In summary, acoustical situations are considerably diverse in ecological environments, and therefore there might be fewer “unnatural” sounds than often assumed.

What is sound? (XI) What is loudness?

In the previous post, I discussed proximal aspects of loudness, which depend on the acoustical wave at the ear. For example, when we say that a sound is too loud, we are referring to an unpleasant feeling related to the effect of acoustical waves sensed at the ear. That is, the same sound source would feel less loud if it were far from the ear.

But we can also perceive the “intrinsic” loudness of a sound source, that is, that aspect of loudness that is not affected by distance. This is a distal property, which I will call source loudness. The loudness of a sound can be defined as proximal or distal in the same way as a visual object has a size on the retina (proximal) and a physical size as an external object (distal).

First of all, what can possibly be meant by source loudness? We may consider that it is a perceptual correlate of an acoustical property of the sound source, for example the energy radiated by a sound source. The acoustical energy at a given point depends on the distance to the source through the inverse square law, but the total energy at a given distance (integrated on the whole surface of a sphere) is constant (neglecting reflections). However, we cannot sense this kind of invariant since it implies sampling the acoustical wave in the entire space (but see the last comments in this post).

An alternative is to consider that source loudness is that property of sound that does not vary with distance, and more generally with the acoustical environment. The problem with this definition is that it applies to all distal properties of sound (pitch, speaker identity, etc). The fact that we refer to source loudness using the word loudness suggests that there is relationship between proximal and distal loudness. Therefore, we may consider that source loudness is that property of sound that is univocally related to (expected) proximal loudness at a reference location (say, at arm distance). Defined in this way, source loudness indeed does not depend on distance. Indirectly, it is a property of the sound field radiated by the source, although it is defined in reference with the perceptual system (since proximal loudness is not an intrinsic property of acoustical waves, as I noted in the previous post).

Another way to define source loudness involves action. For example, we can produce sounds by hitting different objects. The loudness of the sound correlates with the energy we put in the action. So we could imagine that source loudness corresponds to the energy we estimate necessary to produce the sound. This also gives a definition that does not depend on source distance. However, hitting the ground produces a sound that feels louder when the ground is hard (concrete) than when it is soft (grass, snow). Some of the energy we put into the mechanical interaction is dissipated and some is radiated, and it seems that only the radiated energy contributes to perceived loudness. Therefore, this definition is not entirely satisfying. I would not entirely discard it, because as we have seen loudness is not a unitary percept. It might also be relevant for speech: when we say that someone screams or speaks softly, we are referring to the way the vocal chords are excited. Thus, this is a distal notion of loudness that is object-specific.

So we have two notions of source loudness, which are invariant with respect to distance. One is related to proximal loudness; the other one is related to the mechanical energy required to produce the sound and is source-specific (in the sense that the source must be known). The next question is: what exactly in the acoustical waves is invariant with respect to distance? In Gibsonian’s terms, what is the invariant structure related to source loudness?

Let us start with the second notion. What in the acoustical wave specifies the energy put into the mechanical interaction? If this property is invariant to distance, then it should be invariant with respect to scaling the acoustical wave. It follows that such information can only be captured if the relationship between interaction strength and the resulting wave is nonlinear. Source loudness in this sense is therefore a measure of nonlinearity that is source-specific.

The first notion defines source loudness in relationship to proximal loudness at a reference distance. What in the acoustical wave specifies the proximal loudness that would be perceived at a different distance? One possibility is that this involves a highly inferential process: source loudness is first perceived in a source-specific way (previous paragraph), and then associated with proximal loudness. Another inferential process would be: distance is estimated using various cues, and then proximal loudness at a reference distance is inferred from proximal loudness at the current location. One such cue is the spectrum: the air absorbs high frequencies more than low frequencies, and therefore distant sounds have less high frequency content. Of course this is an ambiguous cue since spectrum at the ear also depends on the spectrum at the source, so it is only a cue in a statistical sense (i.e., given the expected spectral shape of natural sounds).

There is another possibility that was tested with psychophysical experiments in a very interesting study (Zahorik & Wightman (2001)). The subjects listen to noise bursts played from various distances at various intensities, and are asked to evaluate source loudness. The results show that 1) the evaluation of loudness does not depend on distance, 2) the scale of loudness depends on source intensity in the same way as for proximal loudness (loudness at the ears). This may seem surprising, since the sounds have no structure, they do not have the typical spectrum of natural sounds (which tend to decay as 1/f) and there is no nonlinearity involved. The key is that the sounds were presented in a reverberating environment (room). The authors propose that loudness constancy is due to the properties of diffuse fields. In acoustics, a diffuse field has the property that it is identical at all spatial locations within the environment. This is never entirely true of natural environments, but some reverberant environments are close to it. This implies that the reverberant part of the signal depends linearly on the source signal but does not depend on distance. Therefore, the reverberant part is invariant with respect to source location and can provide the basis for the notion of source loudness that we are considering. Reverberation preserves the spectrum of the source signal, but not the temporal envelope (which is blurred). However, we note that since reverberation depends on the specific acoustical environment, it is in principle only informative about the relative loudness of different sources; but it is important to observe that it allows comparisons between different types of sources.

Alternatively, the ratio between direct and reverberant energy provides a way to estimate the distance of the source, from which source loudness can be deduced. But we note that estimating the distance is in fact not necessary to estimate source loudness. The study does not mention cues due to early reflections on the ground. Indeed a reflection on the ground interferes with the direct signal at a specific frequency that is inversely proportional with the delay between direct and reflected signals. This could be a monaural or binaural cue to distance (Gourévitch & Brette 2012).

To conclude this post, we have seen that loudness actually encompasses several distinct notions:

1) a proximal notion that is related to intelligibility (as in “not loud enough”), and therefore to the relationship between the signal of interest and the background, considered as a distracter;

2) a proximal notion that is related to biological responses to the acoustical signal (as in “too loud”), which may (speculatively) be numerous (energy consumption, risk of cochlear damage, startle reflex);

3) a distal notion that relates to the mechanical energy involved in producing the sound (a sensorimotor notion), which is source-specific;

4) a distal notion that relates to the sound field radiated from the source, independently of the distance, which may be defined as the expected proximal loudness at a reference distance.

Notes on consciousness. (I) Time

What is time and how is it perceived? This is of course a vast philosophical question, which I will only scratch.

1) Time, space and existence

It is customary to describe time as “the fourth dimension”. This point of view comes from the equations of mechanics and is highly misleading, because it seems to imply that time is of the same kind as space. A century ago, Henri Poincaré noted that our concept of space, both perceptually and scientifically, derives from our physical interactions with the world. That is to say, knowing where something is is knowing how to get there. Space is defined by the laws that govern movements in the physical world and the structure of these laws (Euclidean geometry). A law, some property that does not change, can only be defined with respect to something that changes. Therefore, time, defined as the source of change in the world, is a prerequisite to space. Space exists only by its persistence through the passing of time.

2) Time and change

In fact, nothing exists without the passing of time, because the essence is precisely what does not change through the flow of time. If we see someone throwing a ball, that ball is moving. Our visual sensations change, but we see a ball in movement: this is to say that there is something in the visual signals that does not change, which characterizes the ball as such. We do not see an object in the flickering white noise of a TV set.

In the TV series Bewitched, Samantha the housewife twitches her nose and everyone freezes except her. Then she twitches her nose and everyone unfreezes, without noticing that anything happened. For them, time has effectively stopped. This is to say that time is not perceived as such, but only through the changes it causes in our body. It is these changes that are perceived, not time per se (i.e., not time as in the variable in the equations of mechanics).

3) Irreversibility of time

From the fact that time is the perceived cause of changes, it follows that time has a direction, because physical processes are generally irreversible. This is also related to the theorem in information theory that states that information can only be lost, and never gained, when a process is applied to a variable. The current state of a physical system results from previous processes only, which constitutes “the past”.

A physical system in which events occur (our body) can be seen as a dynamical system, or series of processes that make the state of the system evolve. From one state s, the system changes subsequently to state s’. There is a direction to this change: s -> s’. This is the action of time on the system, and it is directed (the “arrow of time”). If the system where isolated, then time would be arbitrary. One could consider any dimension that is isomorphic to time and preserves directionality, and call it “time”, without changing the organization of changes within the system. It would make no difference for the system.

4) The unity of time

This raises the question of the perceptual unity of time: if time is perceived through changes in our body, then why do we feel that time is a single thing, when lots of different things change in our body? How is it that an auditory event and a visual event can appear to occur “at the same time”, given that they impact different receptors? Why isn’t there a different time for each process in our body? What does it mean that an event occurs “before” another one?

Imagine two independent processes that are spatially separated. From the perspective of these processes, it would make no difference if time passed at a different pace. The unity of time must come from an interaction between processes. The interaction between different processes defines a common flow of time.

Going further would probably require a discussion of consciousness and working memory, so I will leave these questions mostly unanswered for now.

5) The grain of time

How fine is our perception of time? When one listens to an auditory click played through headphones with 500 µs delay between the two ears, we do not hear two clicks. We hear a single click, lateralized towards one side. If we repeatedly play clicks at 50 Hz (every 20 ms), we do not hear a series of clicks. We hear a single continuous sound. When we listen to a pure tone at 50 Hz, the amplitude of the tone varies all the time but we do not hear this variation of amplitude. On the contrary, it feels like the tone has constant loudness.

These remarks suggest that our perception of time has a “grain” of a few tens of ms. That is, processes occurring within a few tens of ms are perceived as being caused by the same event, and the temporal occurrence of events within that time window is not perceived as time. Why?

To see how tricky this is, consider again the first example, when we listen to two clicks delayed by 500 µs between the two ears. The temporal order of the clicks can be clearly distinguished: if the click is first played in the left earphone, the sound is perceived as coming from the left, and conversely if the click is first played in the right earphone. In addition, if the delay between the two clicks is changed, then the sound is perceived as coming from a different direction (usually somewhere between the two ears), in a way that is reproducible. Such changes are perceived when the delay is changed by about 20 µs.

So from a computational point of view, time is processed with a grain of 20 µs. But phenomenologically, time appears to have a grain about a thousand times larger. Why such a difference? The perceptual grain of time does not appear to reflect the precision of neural processing, or in other words, the timescale at which states of the brain seem constant.

6) Duration

This post probably raised more questions than I could answer. I will end it with a discussion of the concept of duration. Spinoza described it as follows: “Duration is an attribute under which we conceive the existence of created things insofar as they persevere in their actuality”. This is essentially the point I have developed at the beginning of this post. In contrast with, say, color and pitch, duration is not a quality of things. Duration is about existence (the fact that a thing exists), while color or pitch is about essence (what this thing is). Properties of objects are defined by their persistence through time, but duration does not persist through time. Rather, duration quantifies how much time some properties exist. For example, it can be said that a musical note has a timbre (the instrument), a pitch and a duration. These are not three independent qualities: duration is about the pitch and timbre (for how much time they can be said to exist), but timbre is not about duration.

In summary: time is about existence, space is about essence.

Is perception about inference?

One philosophical theory about perception claims that perceiving is inferring the external world from the sensory signals. The argumentation goes as follows. Consider the retina: there is a spatially inhomogeneous set of photoreceptors; the image projected onto the retina is inverted, but you don’t see the world upside down; there are blood vessels that you normally don’t see; there is a blind spot where the optic nerve starts that you normally don’t notice; color properties of photoreceptors and their spatial sampling is inhomogeneous and yet color doesn’t change when you move the eyes. Perceptually, the visual field seems homogeneous and independent of the position of the eyes, apart from a change of perspective. So certainly what you perceive is not the raw sensations coming from your photoreceptors. These raw sensations are indirectly produced by things in the world, which have some constancy (compared to eye movements, for example). The visual signals in your retina are not constant, but somehow your perception is constant. Therefore, so the argument goes, your mind must be reconstructing the external world from the sensory signals, and what you perceive is this reconstruction.

Secondly, visual signals are ambiguous. A classical example is the Necker cube: a wire frame cube drawn in isometric perspective on a piece of paper, which can be perceived in two different ways. More generally, the three-dimensional world is projected on your retina as a two-dimensional image, and yet we see in three dimensions: the full 3D shape of objects must then be inferred. Another example is that in the dark, visual signals are noisy and yet you can see the world, although less clearly, and you don’t see noise.

I would then like to consider the following question: why, when I am looking at an apple, do I not see the back of the apple?

The answer is so obvious that the question sounds silly. Obviously, there is no light going through the object to our eyes, so how come could we see anything behind it? Well precisely, the inference view claims that we perceive things that are not present in the sensory signals but inferred from them. In the case of the Necker cube, there is nothing in the image itself that informs us of the true three-dimensional shape of the cube; there are just two consistent possibilities. But in the same way, when we see an apple, there are a number of plausible possibilities about how the back of the apple should be, and yet we only see the front of the apple. Certainly we see an apple, and we can guess how the back of the apple looks like, but we do not perceive it. A counter-argument would be that inference about the world is partial: of course we cannot infer what is visually occluded by an object. But this is circularly reasoning: perception is the result of inference, but we only infer what can be perceived.

One line of criticism of criticism of the objectivist/inferential view starts from Kant’s remark that anything we can ever experience comes from our senses, and therefore one cannot experience the objective world as such, even through inference, since we have never had access to the things to be inferred. This leads to James Gibson’s ecological theory of perception, who considered that the (phenomenal) world is directly perceived as the invariant structure in the sensory signals (the laws that the signals follow, potentially including self-generated movements). This view is appealing in many respects because it solves the problem raised by Kant (who concluded that there must be an innate notion of space). But it does not account for the examples that motivate the inferential view, such as the Necker cube (or in fact the perception of drawings in general). A related view, O’Regan’s sensorimotor theory of perception, also considers that objects of perception must be defined in terms of relationships between signals (including motor signals) but does not reject the possibility of inference. Simply, what is to be inferred is not an external description of the world but the effect of actions of sensory signals.

So some of the problems of the objectivist inferential view can be solved by redefining what is to be inferred. However, it still remains that in an inferential process, the result of inference is in a sense always greater than its premises: there is more than is directly implied by the current sensory signals. For example, if I infer that there is an apple, I can have some expectations about how the apple should look like if I turn it, and I may be wrong. But this part where I may be wrong, the predictions that I haven’t checked, I actually don’t see it – I can imagine it, perhaps.

Therefore, perception cannot be the result of inference. I suggest that perception involves two processes: 1) an inferential process, which consists in making a hypothesis about sensory signals and their relationship with action; 2) a testing process, in which the hypothesis is tested against sensory signals, possibly involving an action (e.g. an eye movement). These two processes can be seen as coupled, since new sensory signals are produced by the second process. I suggest that it is the second process (which is conditioned by the first one) that gives rise to conscious perception. In other words, to perceive is to check a hypothesis about the senses (possibly involving action). According to this proposition, subliminal perception is possible. That is, a hypothesis may be formed with insufficient time to test it. In this case, the stimulus is not perceived. But it may still influence the way subsequent stimuli are perceived, by influencing future hypotheses or tests.

Update. In The world as an outside memory, Kevin O'Regan expressed a similar view: "It is the act of looking that makes things visible".