Musical notes have a particular perceptual quality called “pitch”. Pitch is the percept corresponding to how low or high a musical note is. Vowels also have a pitch. To a large extent, the pitch of a periodic sound corresponds to its repetition rate. The important point is that what matters in pitch is more the periodicity than the frequency content. For example, a periodic sound with repetition rate f0 has frequency components at multiples of f0 (n.f0), which are called harmonics. A pure tone of frequency f0 and a complex tone with all harmonics except the first one, i.e., which does not contain the frequency component f0, will evoke the same pitch. It is in fact a little more complex than that, there are many subtleties, but I will not enter into these details in this post. Here I simply want to describe the kind of sensory or sensorimotor structure there is in pitch. It turns out that pitch has a surprisingly rich structure.
The most obvious type of structure is periodicity. Pitch-evoking sounds have this very specific property that the acoustical wave is unchanged when temporally shifted by some delay. This delay is characteristic of the sound’s pitch (i.e., same period means same pitch). This is the type of structure that is emphasized in temporal theories of pitch. This is what I call the “similarity structure” of the acoustical signal, and this notion can in fact be extended and accounts for a number of interesting phenomena related to pitch. But this is work in progress, so I will discuss it further at a later time.
Another way to see periodic sounds is to realize that a periodic sound is predictable. That is, after a couple periods, one can predict the future acoustical wave. Compared to most other sounds, periodic sounds have a very high degree of predictability. Perhaps the perceptual strength of pitch (which depends on a number of factors) is related to the degree of predictability of the sound.
There is another type of structure that is in some sense orthogonal to the similarity structure I just described, which one might call the “dissimilarity structure”. Natural sounds (apart from vocalizations) tend to have a smooth spectrum. Periodic sounds, on the other hand, have a discrete spectrum. Thus, in some sense, periodic sounds have a “surprisingly discontinuous” spectrum. Suppose for example that two auditory receptors respond to different but overlapping parts of the spectrum (e.g., two nearby points on the basilar membrane). Then one can usually predict the sensory input to the second receptor given the sensory input to the first receptor, because natural sounds tend to have a continuous spectrum. But this prediction would fail with a periodic sound. Periodic sounds are maximally surprising in this sense. The interesting thing about the dissimilarity structure of pitch is that it accounts for binaural pitch phenomena such as Huggins’ pitch: noise with flat spectrum is presented on both ears, and the interaural phase difference changes abruptly at a given frequency; a tone is perceived, with the pitch corresponding to that frequency.
Thus, pitch-evoking sounds simultaneously have two types of structure that distinguish them from other types of sounds: the similarity structure, which consists of different views of the acoustical signal that are unusually similar, and the dissimilarity structure, which consists of different views of the acoustical signal that are unusually dissimilar. This first type of structure corresponds to what I examined in my paper on computing with neural synchrony. It is important to notice that these two types of structure have a different nature. The similarity structure corresponds to a law that the sensory signals follow. Here the percept is associated to the specific law that these signals follow. The dissimilarity structure corresponds to the breaking of a law that sensory signals usually follow. Here the percept is associated to a law that is specific not of the presented sensory signals, but of the usual sensory signals. Thus we might relate the similarity structure to the notion of discovery, and the dissimilarity structure to the notion of surprise (and perhaps the term “structure” is not appropriate for the latter).
So far, I have only considered the structure of the acoustical signal, but one may also consider the sensorimotor structure of pitch. As I mentioned in another post, periodic sounds are generally produced by living beings, so it makes sense to examine these sounds from the viewpoint of their production. When one produces a pitch-evoking sound (for example a vowel, or when one sings), there is a very rich structure that goes beyond the acoustical structure. First, there is proprioceptive information about vocal muscles and tactile information about the vibrations of the larynx, and both are directly related to the period of sounds. There is also the efferent copy, i.e., the motor commands issued to make the vocal folds vibrate in the desired way. For a person who can produce sounds, pitch is then associated to a rich and meaningful sensorimotor structure. In fact, the sensorimotor theory of pitch perception would be that to perceive the pitch of a sound is, perhaps, to perceive the movements that would be required to produce such acoustical structure. An interesting aspect of this view is that it provides some meaning to the notion of how low or high a pitch-evoking sound is, by associating it with the state of the different elements involved in sound production. For example, to produce a high sound requires to increase the tension of the vocal cords, and to move the larynx up (higher!). One question then is whether congenitally mute people have a different perception of pitch.
Observe that, as for binaural hearing , the sensorimotor structure of pitch should not be understood as the relationship between motor commands and auditory signals, but rather as the relationship between motor commands and the structure of auditory signals (e.g. the periodicity). In this sense, it is higher-order structure.