In his book “Vision”, David Marr briefly comments on James Gibson’s ecological approach, and rejects it. He makes a couple of criticisms that I think are fair, for example the fact that Gibson seemed to believe that extracting meaningful invariants from sensory signals is somehow trivial, while it is a difficult computational problem. But David Marr seems to have missed the important philosophical points in James Gibson’s work. These points have also been made by others, for example Kevin O’Regan, Alva Noë, but also Merleau-Ponty and many others. I will try to summarize a few of these points here.
I quote from David Marr: “Vision is a process that produces from images of the external world a description that is useful to the viewer and not cluttered with irrelevant information”. There are two philosophical errors in this sentence. First, that perception is the production of a representation. This is a classical philosophical mistake, the homunculus fallacy. Who then sees this representation? Marr even explicitly mentions a “viewer” of this representation. One would have to explain the perception of this viewer, and this reasoning leads to an infinite regress.
The second philosophical mistake is more subtle. It is to postulate that there is an external source of information, the images in the retina, that the sensory system interprets. This is made explicit later in the book: “(...) the initial representation is in no doubt – it consists of arrays of image intensity values as detected by the photoreceptors in the retina”. This fact is precisely what Gibson doubts at the very beginning of his book, The Ecological Approach to Visual Perception. Although it is convenient to speak of information in sensory signals, it can be misleading. It makes a parallel with Shannon’s theory of communication, but the environment does not communicate with the observer. Surfaces reflect light waves in all directions. There is no message in these waves. So the analogy between a sensory system and a communication channel is misleading. The fallacy of this view is fully revealed when one considers the voluntary movements of the observer. The observer can decide to move and capture different sensory signals. In Gibson’s terminology, the observer samples the ambient optic array. So what is primary is not the image, it is the environment. Gibson insists that a sensory system cannot be reduced to the sensory organ (say, the eyes and the visual cortex). It must include active movements, embedded in the environment. This is related to the embodiment theory.
We tend to feel that what we see is like the image of a high-resolution camera. This is a mistake due to the immediate availability of visual information (by eye movements). In reality, a very small part of the visual field has high resolution, and a large part of the retina has no photoreceptors (the blind spot). We do not feel this because when we need the information, we can immediately direct our eyes towards the relevant target in the visual field. There is no need to postulate that there is an internal high-resolution representation in which we can move our “inner eye”. Rodney Brooks, a successful researcher in artificial intelligence and robotics, once stated “the world is its own best model”. The fact that we actually do not have a high-resolution mental representation of the visual world (an image in the mind) has been demonstrated spectacularly through the phenomena of change blindness and inattentional blindness, in which a major change in an image or movie goes unnoticed (see for example this movie).
Thank you for this interesting message. The argument is very O'Reganian sounding, but I quite agree with it. Nevertheless, is this debate really important? Despite Marr's philosophical error as you said, his book and his work were really the starting point of computational vision, and you can say even computational neuroscience. Moreover, it is still very influential nowadays. Interestingly, you can read in textbooks that computer vision is basically an application of Gibson's philosophy. So who is right, who is wrong, if at the end both converge?
There are definitely relationships between the two. I wrote another post that is more specific about them.
In fact, a good analysis was provided Francisco Varela. He describes Marr's approach as "computational objectivism". The main difference with Gibson's approach, and even more with enactive and sensorimotor approaches, is that embodiment is intrinsic to the definition of the computational goal. In a discussion about color, Varela shows that Marr-type approaches cannot explain anything about the phenomenology of color, because they postulate that color is the reflectance spectrum of surfaces, which is observer-independent. But Marr's approach improves upon "neurophysiological subjectivism", which downplays the teleonomic (i.e. functional) nature of perceptual systems.