Plato's Cave: Biederman's Geon Theory
Biederman's Geon Theory
Biederman notes that certain properties of visual features remain
invariant to perspective transformation through small angles. For
example a straight edge appears straight, while a curved edge appears
curved, through a wide range of rotations of the object, although the
exact angle or curvature of that edge changes with rotation.
Biederman thus proposes the Geon Theory, a representation of visual
form in terms of these relatively invariant features.
A simplified example of this concept is illustrated below, where
objects are encoded as a four-digit code, the first digit of which
encodes whether the edges are straight or curved; the second digit
encodes whether the object has reflection symmetry, rotation symmetry,
or both; the third digit encodes how the shape of the object changes
with distance along its central axis, and the fourth digit encodes
whether that central axis is straight or curved.
Relations between objects can be encoded in similar manner, allowing a
decomposition of objects into their component pieces, resulting in a
compressed code for each complex object. For example the teapot shown
above would be encoded by its three component objects (handle, body,
spout) and the relations between them as...
- 1301: straight-edged, reflection & rotation symmetric, constant
sweep, curved central axis
- <037>: smaller than, beside, join both ends to side
- 1310: straight-edged,reflection & rotation symmetric, expanding
sweep, straight central axis
- <136>: bigger than, beside, join side to end
- 1321: straight edged, reflection & rotation symmetric,
contracting sweep, curved central axis
These properties would remain relatively invariant through small
rotations of the object.
The problems with this scheme are that nobody has ever devised a
system which can perform this encoding on a natural scene. Even if
they could, certain natural shapes such as trees, shrubs, grass, hair,
rocks, cannot be expressed in this manner because they are composed of
far too many components. Finally, the highly compressed abstract code
is nothing like the subjective percept of a teapot, which appears
complete in all of its curves and surfaces. Again, in this scheme the
three-dimensional information is the last to be computed, and
is only computed in the most abstract manner. Evidence from visual
illusions suggests that the three-dimensional percept is the
first to be computed, and the percept is not an abstraction,
but is filled in with perceptual surfaces and boundaries in a
three-dimensional form.
Return to argument
Return to Steve Lehar