How is it possible to learn by imitation? For example, consider a child learning to speak. She reproduces a word produced by an adult, for example “Mom”. How is this possible? At first sight, it seems like there is an obvious answer: the child tries to activate her muscles so that the sound produced is similar. But that’s the thing: the sound is not similar at all. A child is much smaller than an adult, which implies that: 1) the pitch is higher, 2) the whole spectrum of the sound is shifted towards higher frequencies (the “acoustic scale” is smaller). So if one were to compare the two acoustic waves, she would find little similarity (both in the time domain and in the spectral domain). Therefore, learning by imitation must be based on a notion of similarity that resides at a rather conceptual level – not at all the direct comparison of sensory signals. Note that the sensorimotor account of perception (in this case the motor theory of speech) does not really help here, because it still requires explaining why the two vastly different acoustic waves should relate to similar motor programs. To be more precise: the two acoustic waves actually do relate to similar motor programs, but the adult’s motor program cannot be observed by the child: the child has to relate the acoustic result of the adult’s motor program with her own motor program, when the latter does not produce the same acoustic result. Could there be something in the acoustic wave that directly suggests the motor program?
This was the easy problem of imitation. But here’s a harder one: how can you imitate a smile? In this case, you can only see the smile you want to imitate on the teacher’s face, but you cannot see your own smile. In addition, it seems unlikely that the ability is based on prior practicing in front of a mirror. Thus, somehow, there is something in the visual signals that suggests the motor program. These are two completely different physical signals, therefore the resemblance must lie somewhere in the higher-order structure of the signals. This means that the perceptual system is able to extract an amodal notion of structure, and compare two structures independently of their sensory origin.