Plato's Cave: Marr's Vision

Marr's Vision

Marr proposed a model of visual processing that begins by identifying the "zero-crossings" (edges) in the image, and then using this edge information to provide a crude segmentation of surfaces, called the 2-1/2-D sketch, and finally extracting from this sketch the three-dimensional spatial information. That spatial interpretation is expressed in terms of geometrical primitives such as generalized cylinders or cones, so that the only data which must be explicitly stored are the x,y,z locations, alpha,beta,gamma orientations, aspect ratios, etc. of each of the cylinders, as well as a symbolic code of the relations between them, thus reducing the complex scene to a highly compressed set of meaningful numbers.

Notice that in this model the three-dimensional spatial information is the last stage of processing.

The problem with this model is that nobody has ever been able to define how this spatial information can be reliably extracted from the scene. Again, the visual world contains far too much ambiguity to be handled successfully in this manner.

Return to argument

Return to Steve Lehar