What is sound? Physically, sounds are mediated by acoustical waves. But vision is mediated by light waves and yet hearing does not feel like vision. Why is that?
There are two wrong answers to this question. The first one is that the neural structures are different. Sounds are processed in the cochlea and in the auditory cortex, images by the retina and visual cortex. But then why doesn’t a sound evoke some sort of image, like a second visual system? This point of view does not explain much about perception, only about what brain areas “light up” when a specific type of stimulus is presented. The second one is that the physical substrate is different: light waves vs. acoustic waves. This is also a weak answer, for what is fundamentally different between light and acoustic waves that would make them “feel” different?
I believe the ecological approach provides a more satisfying answer. By this, I am referring to the ecological theory of visual perception developed by James Gibson. It emphasizes the structure of sensory signals collected by an observer in an ecological environment. It is also related the sensorimotor account of perception (O’Regan & Noë 2001), which puts the emphasis on the relationship between movements and sensory signals, but I will show below that this emphasis is less relevant in hearing (except in spatial hearing).
I will quickly summarize what is vision in Gibson’s ecological view. Illumination sources (the sun) produce light rays that are reflected by objects. More precisely, light is reflected by the surface of objects with the medium (air, or possibly water). What is available for visual perception are surfaces and their properties (color, texture, shape...). Both the illumination sources and the surfaces in the environment are generally persistent. The observer can move, and this changes the light rays received by the retina. But these changes are highly structured because the surfaces persist, and this structure is informative of the surfaces in the environment. Thus what the visual system perceives is the arrangement and properties of persistent surfaces. Persistence is crucial here, because it allows the observer to use its own movements to learn about the world – in the sensorimotor account of perception, perception is precisely the implicit knowledge of the effect of one’s actions on sensory signals.
On the other hand, sounds are produced by the mechanical vibration of objects. This means that sounds convey information about volumes rather than surfaces. They depend on the shape but also on the material and internal structure of objects. It also means that what is perceived in sounds is the source of the waves rather than their interaction with the environment. Crucially, contrary to vision, the observer cannot directly interact with sound waves, because a sound happens, it is not persistent. An observer can produce a sound wave, for example by hitting an object, but once the sound is produced there is no possible further interaction with it. The observer cannot move to analyze the structure of acoustic signals. The only available information is in the sound signal itself. In this sense, sounds are events.
These ecological observations highlight major differences between vision and hearing, which go beyond the physical basis of these two senses (light waves and acoustic waves). Vision is the perception of persistent surfaces. Hearing is essentially the perception of mechanical events on volumes. These remarks are independent from the fact that vision is mediated by a retina and hearing by a cochlea.
I like the idea that whereas vision concerns surfaces, sounds concern volumes.
But you say that whereas vision is persistent, sound is not: sounds are "events". I think that sounds have several aspects. Some of them concern the nature of the source of the sound. Some of them concern the spatial location of the source. The spatial location of the source can persist, because the source can repetitively make a sound, as in the case of a speaker at a cocktail party.
Absolutely, I actually discuss this in the third post of the series (spatial hearing). What I mean is that the acoustical wave itself is an event: you cannot directly interact with it because of the arrow of time. But there can be invariants in the structure of repeated events. But in this case, what is repeated is generally not the acoustical wave itself but some structure in the acoustical waves (e.g. the ITD as in your example). There can also be invariants in a single event (I give the example of pitch here: http://briansimulator.org/perceptual-invariants-representational-vs-structural-theories/).