What kinds of sounds are there in the world? This is essentially the question William Gaver addresses in a very interesting paper (Gaver, 1993), in which he describes an ontology of sounds, categorized by the type of interaction. There are three categories: sounds made by solids, liquids and gases. An example of a sound made by liquid is dripping. There are also hybrid sounds, such as the rain falling on a solid surface. It makes sense to categorize sounds based on the nature of the objects because the mechanical events are physically very different. For example, in sounds involving solids (e.g. a footstep), energy is transmitted at the interface between two solids, which is a surface, and the volumes are put in motion (i.e., they are deformed). This is completely different for sounds involving gases, e.g. wind. In mechanical events involving solids, the shape is essentially unchanged (only transiently deformed). This is a sort of structural invariance that ought to leave a specific signature on the sounds (more on this in another post). Sounds made by gases, on the other hand, correspond to irreversible changes.
These three categories correspond to the physical nature of the sound producing substances. There are subcategories that correspond to the nature of the mechanical interaction. For example, a solid object can be hit or it can be scraped. The same object vibrates but there is a difference in the way it is made to vibrate. This also ought to produce some common structure in the auditory signals, as is explained in Gaver's companion article. For example, a vibrating solid object has modes of vibration that are determined by its shape (more on this in another post). These modes do not depend on the type of interaction with the object.
Interactions that are localized in time are impact sounds, while continuous interactions produce auditory textures. These are two very distinct types of sounds. Both have a structure, but auditory textures, it seems, only have a structure in a statistical sense (see McDermott & Simoncelli, 2011). Another kind of auditory texture is the type of sounds produced by a river, for example. These sounds also have a structure in a statistical sense. An interesting aspect, in this case, is that these sounds are not spatially localized: they do have an auditory size (see my post on spatial hearing ).
The examples I have described correspond to what Gaver calls "basic level events", elementary sounds produced by a single mechanical interaction. There are also complex events, which are composed of simple events. For example, a breaking sound is composed of a series of impact sounds. A bouncing sound is also composed of a series of impact sounds, but the temporal patterning is different, because it is lawful (predictable) in the case of a bouncing sound. Walking is yet another example of a series of impact sounds, which is also lawful, but it differs in the temporal patterning: it is approximately periodic.
Gaver only describes sounds made by non-living elements of the environment (except perhaps for walking). But there are also sounds produced by animals. I will describe them now. First, some animals can produce vocalizations. In Gaver's terminology, vocalizations are a sort of hybrid gas-solid mechanical event: periodic pulses of air make the vocal folds vibrate. The sound then resonates in the vocal tract, which shapes the spectrum of the sound (in a similar way as the shape of an object determines the resonating modes of impact sounds). One special type of structure in these sounds is the periodicity of the sound wave. The fact that a sound is periodic is highly meaningful, because it means that energy is continuously provided, and therefore that a living being is most likely producing it. There are also many other interesting aspects that I will describe in a later post.
Animals also produce sounds by interacting with the environment. These are the same kinds of sounds as described by Gaver, but I believe there is a distinction. How can you tell that a sound has been produced by a living being? Apart from identifying specific sounds, I have two possible answers to provide. First, in natural non-living sounds, energy typically decays. This distinguishes walking sounds from bouncing sounds, for example. In a bouncing sound, the energy decreases at each impact. This means both the intensity of the sound and the interval between successive impacts decay. This is simply because a bouncing ball starts its movement with a potential energy, that can only decay. In a walking sound, roughly the same energy is brought at each impact, so it cannot be produced by the collision of two solids. Therefore, sounds contain a signature of whether it is produced by continuous source of energy. But a river is also a continuous source of energy (and the same would apply to all auditory textures). Another specificity is that sounds produced by the non-living environment are governed by the laws of physics, and therefore they are lawful in a sense, i.e., they are predictable. A composed sound with a non-predictable pattern (even in a statistical sense) is most likely produced by a living being. In a sense, non-predictability is a signature of decision making. This remark is not specific to hearing.
These are specificities of sounds produced by living beings, as heard by another observer. But one can also hear self-produced sounds. There are two new specificities about these types of sounds. First, they also make the body vibrate, for example, a foot hits the ground. This produces sound waves with a specific structure. But more importantly, self-produced sounds have a sensorimotor structure. Scraping corresponds to a particular way in which one interacts with an object. The time of impact corresponds to the onset of the sound. The intensity of the sound is directly related to the energy with which an object is hit. Finally, the periodicity of vocalizations (i.e., the pitch), corresponds to the periodicity of self-generated air pulses through the vocal folds, and the formant frequencies correspond to the shape of the vocal tract. Self-generated sounds also have a multimodal structure. For example, they produce vibrations in the body than can be perceived by tactile receptors. In the next post, I will look at the structure of pitch.