An intriguing fact about the pitch of tones is that we tend to describe it using spatial characteristics such as “high” and “low”. In the same way, we speak of a rising intonation when the pitch increases. A sequence of notes with increasing frequency played on a piano scale is described as going “up” (even though it is going right on a piano, and going down on a guitar). Yet there is nothing intrinsically spatial in the frequency of a tone. Why do we use these spatial attributes? An obvious possibility is that it is purely cultural: “high” and “low” are just arbitrary words that we happen to use to describe these characteristics of sounds. However, the following observations should be made: - We use the terms low and high, which are also used for spatial height, and not specific words such as blurp and zboot. But we don’t use spatial words for colors and odors. Instead we use specific words (red, green) or sometimes words used for other senses (a hot color). Why use space and not something else? - All languages seem to use more or less the same type of words. - In an experiment done in 1930 by Caroll Pratt (“The spatial character of high and low tones”), subjects were asked to locate tones of various frequencies on a numbered scale running from the floor to the ceiling. The tones were presented through a speaker behind a screen, placed at random height. It turned out that the judgment of spatial height made by subjects was very consistent, but was entirely determined by tone frequency rather than actual source position. High frequency tones were placed near the ceiling, low frequency tones near the floor. The result was later confirmed in congenitally blind persons and in young children (Roffler & Butler, JASA 1968). Thus, there is some support for the hypothesis that tones are perceived to have a spatial character, which is reflected in language. But why? Here I will just speculate widely and make a list of possibilities. 1. Sensorimotor hypothesis related to vocal production: when one makes sounds (sings or speaks), sounds of high pitch are felt to be produced higher than low pitch sounds. This could be related to the spatial location of tactile vibrations on the skin depending on fundamental frequency or timbre. Professional singers indeed use spatial words to describe where the voice “comes from” (which has no physical basis as such). This could be tested by measuring skin vibrations. In addition, congenitally mute people would show different patterns of tone localization. 2. Natural statistics: high frequency sounds tend to come from sources that are physically higher than low frequency sounds. For example, steps on the ground tend to produce low frequency sounds. Testing this hypothesis would require an extensive collection of natural recordings tagged with their spatial position. But note that the opposite trend is true for sounds produced by humans: adults have a lower voice than children, which are lower in physical height. 3. Elevation-dependent spectral cues: to estimate the elevation of a sound source, we rely on spectral cues introduced by the pinnae. Indeed the circumvolutions of the pinnae introduce elevation-dependent notches in the spectrum. By association, the frequency of a tone would be associated with the spectral characteristics of a particular elevation. This could be tested by doing a tone localization experiment and comparing with individual head-related transfer functions.