Figure 2 (A)
Consider now a different robot in a similar room, but in this case the "black box" contains a specialized spatial mechanism which somehow transforms the video image into a miniature three dimensional model of the surrounding room, like an architect's cardboard model rendered in full color and texture, complete with a miniature representation of the robot itself at the center of the miniature room, as shown in Figure 2 (B).
Figure 2 (B)
This is reminiscent of the "Cartesian Theatre" concept, but with an important distinction- that the little robot at the center of the representation is not a miniature observer of the internal theatre, since this would result in an infinite regress of observers within observers. Instead, the little robot is an integral part of the internal representation, constructed of the same perceptual material as the rest of the internal model, without which the internal model would be incomplete, for the body of the robot is also an object in the room. This model requires no internal observer, but rather the miniature internal model is part of a mechanism used for making spatial judgements. For example an intention to head towards the door in Figure 2 (A) might be represented by a spring or magnet between the miniature copy of the robot and the miniature copy of the door in Figure 2 (B), which would tend to pull the miniature robot towards the miniature door. This in turn would cause the miniature wheels to turn, and that turning would represent a motor signal to command the actual wheels to turn. The external world and its internal representation are closely coupled by visual and other sensory systems, in order to prevent any mis-alignment between the external and internal worlds. For example if the real robot were to encounter an obstacle, like the block depicted in Figure 2 (A) which would stop its intended forward progress, sensors on the robot's wheels would indicate that the wheels had stopped turning, and this, in turn would stop the turning of the miniature wheel in the internal model, and thereby stop the perception of forward motion.
Of the two robots described above, the spatial and the non-spatial, which one more accurately reflects the nature of perception in biological systems? Is the physical world a sufficient model of itself, or is it necessary for the brain to construct an elaborate spatial replica of the external world? I will present evidence from a number of diverse sources which clearly indicate the latter view.