In the previous post, I discussed proximal aspects of loudness, which depend on the acoustical wave at the ear. For example, when we say that a sound is too loud, we are referring to an unpleasant feeling related to the effect of acoustical waves sensed at the ear. That is, the same sound source would feel less loud if it were far from the ear.
But we can also perceive the “intrinsic” loudness of a sound source, that is, that aspect of loudness that is not affected by distance. This is a distal property, which I will call source loudness. The loudness of a sound can be defined as proximal or distal in the same way as a visual object has a size on the retina (proximal) and a physical size as an external object (distal).
First of all, what can possibly be meant by source loudness? We may consider that it is a perceptual correlate of an acoustical property of the sound source, for example the energy radiated by a sound source. The acoustical energy at a given point depends on the distance to the source through the inverse square law, but the total energy at a given distance (integrated on the whole surface of a sphere) is constant (neglecting reflections). However, we cannot sense this kind of invariant since it implies sampling the acoustical wave in the entire space (but see the last comments in this post).
An alternative is to consider that source loudness is that property of sound that does not vary with distance, and more generally with the acoustical environment. The problem with this definition is that it applies to all distal properties of sound (pitch, speaker identity, etc). The fact that we refer to source loudness using the word loudness suggests that there is relationship between proximal and distal loudness. Therefore, we may consider that source loudness is that property of sound that is univocally related to (expected) proximal loudness at a reference location (say, at arm distance). Defined in this way, source loudness indeed does not depend on distance. Indirectly, it is a property of the sound field radiated by the source, although it is defined in reference with the perceptual system (since proximal loudness is not an intrinsic property of acoustical waves, as I noted in the previous post).
Another way to define source loudness involves action. For example, we can produce sounds by hitting different objects. The loudness of the sound correlates with the energy we put in the action. So we could imagine that source loudness corresponds to the energy we estimate necessary to produce the sound. This also gives a definition that does not depend on source distance. However, hitting the ground produces a sound that feels louder when the ground is hard (concrete) than when it is soft (grass, snow). Some of the energy we put into the mechanical interaction is dissipated and some is radiated, and it seems that only the radiated energy contributes to perceived loudness. Therefore, this definition is not entirely satisfying. I would not entirely discard it, because as we have seen loudness is not a unitary percept. It might also be relevant for speech: when we say that someone screams or speaks softly, we are referring to the way the vocal chords are excited. Thus, this is a distal notion of loudness that is object-specific.
So we have two notions of source loudness, which are invariant with respect to distance. One is related to proximal loudness; the other one is related to the mechanical energy required to produce the sound and is source-specific (in the sense that the source must be known). The next question is: what exactly in the acoustical waves is invariant with respect to distance? In Gibsonian’s terms, what is the invariant structure related to source loudness?
Let us start with the second notion. What in the acoustical wave specifies the energy put into the mechanical interaction? If this property is invariant to distance, then it should be invariant with respect to scaling the acoustical wave. It follows that such information can only be captured if the relationship between interaction strength and the resulting wave is nonlinear. Source loudness in this sense is therefore a measure of nonlinearity that is source-specific.
The first notion defines source loudness in relationship to proximal loudness at a reference distance. What in the acoustical wave specifies the proximal loudness that would be perceived at a different distance? One possibility is that this involves a highly inferential process: source loudness is first perceived in a source-specific way (previous paragraph), and then associated with proximal loudness. Another inferential process would be: distance is estimated using various cues, and then proximal loudness at a reference distance is inferred from proximal loudness at the current location. One such cue is the spectrum: the air absorbs high frequencies more than low frequencies, and therefore distant sounds have less high frequency content. Of course this is an ambiguous cue since spectrum at the ear also depends on the spectrum at the source, so it is only a cue in a statistical sense (i.e., given the expected spectral shape of natural sounds).
There is another possibility that was tested with psychophysical experiments in a very interesting study (Zahorik & Wightman (2001)). The subjects listen to noise bursts played from various distances at various intensities, and are asked to evaluate source loudness. The results show that 1) the evaluation of loudness does not depend on distance, 2) the scale of loudness depends on source intensity in the same way as for proximal loudness (loudness at the ears). This may seem surprising, since the sounds have no structure, they do not have the typical spectrum of natural sounds (which tend to decay as 1/f) and there is no nonlinearity involved. The key is that the sounds were presented in a reverberating environment (room). The authors propose that loudness constancy is due to the properties of diffuse fields. In acoustics, a diffuse field has the property that it is identical at all spatial locations within the environment. This is never entirely true of natural environments, but some reverberant environments are close to it. This implies that the reverberant part of the signal depends linearly on the source signal but does not depend on distance. Therefore, the reverberant part is invariant with respect to source location and can provide the basis for the notion of source loudness that we are considering. Reverberation preserves the spectrum of the source signal, but not the temporal envelope (which is blurred). However, we note that since reverberation depends on the specific acoustical environment, it is in principle only informative about the relative loudness of different sources; but it is important to observe that it allows comparisons between different types of sources.
Alternatively, the ratio between direct and reverberant energy provides a way to estimate the distance of the source, from which source loudness can be deduced. But we note that estimating the distance is in fact not necessary to estimate source loudness. The study does not mention cues due to early reflections on the ground. Indeed a reflection on the ground interferes with the direct signal at a specific frequency that is inversely proportional with the delay between direct and reflected signals. This could be a monaural or binaural cue to distance (Gourévitch & Brette 2012).
To conclude this post, we have seen that loudness actually encompasses several distinct notions:
1) a proximal notion that is related to intelligibility (as in “not loud enough”), and therefore to the relationship between the signal of interest and the background, considered as a distracter;
2) a proximal notion that is related to biological responses to the acoustical signal (as in “too loud”), which may (speculatively) be numerous (energy consumption, risk of cochlear damage, startle reflex);
3) a distal notion that relates to the mechanical energy involved in producing the sound (a sensorimotor notion), which is source-specific;
4) a distal notion that relates to the sound field radiated from the source, independently of the distance, which may be defined as the expected proximal loudness at a reference distance.
