Perhaps the biggest puzzle in loudness perception is why a pure tone, or a stationary sound such as a noise burst, feels like it has constant loudness. Or more generally: why does a pure tone feel like it is a constant sound? (both in loudness and other qualities like pitch)
The question is not obvious because physically, the acoustical wave changes all the time. Even though we are sensitive to this change in the temporal fine structure of the wave, because for example it contributes to our perception of pitch, we do not hear it as a change: we do not hear the amplitude rising and falling. Only the envelope remains constant, and this is an abstract property of the acoustical wave. We could have chosen another property. For example, in models of the auditory periphery, it is customary to represent the envelope as a low-pass filtered version of the rectified signal. But this does not produce an exactly constant signal for pure tones.
Secondly, at the physiological level nothing is constant either for pure tones. The basilar membrane follows the temporal fine structure of the acoustical wave. The auditory nerve fibers fire at several hundred Hz. At low frequency they fire at specific phases of the tone. At higher frequency their firing seems more random. In both cases we hear a pure tone with a constant loudness. What is more, fibers adapt: they fire more at the onset of a tone, then their firing rate decreases with time. Yet we do not hear the loudness decreasing. On the other hand, when we strike a piano key, the level (envelope) of the acoustical wave decreases and we can hear this very distinctly. In both cases (pure tone and piano key) the firing rate of fibers decreases, but in one case we hear a constant loudness and in the other case a decreasing loudness.
Finally, it is not just that some high-level property of sound feels constant, but with a pure tone we are simply unable to hear any variation in the sound at all, whether in loudness or in any other quality.
This discussion raises the question: what does it mean that something changes perceptually? To (tentatively) answer this question, I will start with pitch constancy. A pure tone feels like it has a constant pitch. If its frequency is progressively increased, then we feel that the pitch increases. If the frequency remains constant, then the pure tone feels like a completely constant percept. We do not feel the acoustical pressure going up and down. Why? The pure tone has this characteristic property that from the observation of a few periods of the wave, it is possible to predict the entire future wave. Pitch is indeed associated with the periodicity of the sound wave. If the basis of what we perceive as pitch if this periodicity relationship, then as the acoustical wave unfolds, this relationship (or law) remains constantly valid and so the perceived pitch should remain constant. There is some variation in the acoustical pressure, but not in the law that the signal follows. So there is in fact some constancy, but at the level of the relationships or laws that the signal follows. I would propose that the pure tone feels constant because the signal never deviates from the perceptual expectation.
This hypothesis about perceptual constancy implies several non-trivial facts: 1) how sensory signals are presented to the system (in the form of spike trains or acoustical signals) is largely irrelevant, if these specific aspects of presentation (or “projection”) can be included in the expectation; 2) signal variations are not perceived as variations if they are expected; 3) signal variations are not perceived if there is no expectation. This last point deserves further explanation. To perceive a change, an expectation must be formed prior to this change, and then violated: the variation must be surprising, and surprise is defined by the relation between the expectation (which can be precise or broad) and the variation. So if there is no expectation (expectation is broad), then we cannot perceive variation.
From this hypothesis it follows that a completely predictable acoustical wave such as a pure tone should produce a constant percept. Let us come back to the initial problem, loudness constancy, and consider that the firing rate of auditory nerve fibers adapt. For a tone of constant intensity, the firing rate decays at some speed. For tones of increasing intensity, the firing rate might decay at slower speed, or even increase. For tones of decreasing intensity, the firing rate would decay faster. How is it that constant loudness corresponds to the specific speed of decay that is obtained for the tone of constant intensity, if the auditory system never has direct access to the acoustical signals?
Loudness constancy seems more difficult to explain than pitch constancy. I will start with the ecological viewpoint. In an ecological environment, many natural sounds are transient (e.g. impacts) and therefore do not have constant intensity. However, even though the intensity of an impact sound decays, its perceived loudness may not decay, i.e., it may be perceived as a single timed sound (e.g. a footstep). There are also natural sounds that are stationary and therefore have constant intensity, at least at a large enough timescale: a river, the wind. However, these sounds do not address the problem of neural adaptation, as adaptation only applies to sounds with a sharp onset. Finally, vocalizations have a sharp onset and slowly varying intensity (although this might be questionable). Thus, for a vocalization, the expected intensity profile is constant, and therefore it could be speculated that this explains the relationship between constant loudness and constant intensity, despite variations at the neurophysiological level.
A second line of explanation is related to the view of loudness as a perceptual correlate of intelligibility. A pure tone presented in a stationary background has constant intelligibility (or signal-to-noise ratio), and this fact is independent of any further (non-destructive) processing applied to the acoustical wave. Therefore, the fact that loudness is constant for a pure tone is consistent with the view that loudness primarily reflects the intelligibility of sounds.