Plato's Cave: A New Gestalt Model
A New Gestalt Model
Need for a New Approach
The
[failure]
of the
[Conventional approach]
to visual perception to address the
[Gestalt issues]
of perception suggests that the current concepts of neural computation
are inadequate, and that novel principles & mechanisms of perceptual
computation remain to be discovered.
Perceptual Modeling
How are we to model perception if our understanding of the mechanism
of neural computation is incomplete? One way is to follow the example
of the Gestalt theorists, i.e. to study the percept itself, and model
its properties as it appears, i.e. perceptual modeling, as
opposed to neural modeling. Models derrived in this way are
independent of the neural mechanism by which perception is subserved
in the brain. This approach will either converge with known
physiology, or if not, it will provide evidence as to where our
knowledge of neurophysiology must be extended to explain specific
properties of the percept.
An Evolutionary Progression
What follows is an evolutionary progression through a series of
models, beginning with one that reflects the properties of the
collinear percept observed in the Kanizsa figure. That model is then
progressively extended and generalized to account for ever more
properties observed in different perceptual phenomena, while taking
care not to lose the behavior evolved in previous stages.
At each stage in the progression you will find a simplistic model that
neatly explains some aspects of visual perception, while leaving many
fundamental issues unresolved. I ask your indulgence during this
progression by suspending your skepticism at least until we
reach the final model, at which point all the pieces will come
together and many of those issues will be resolved including, in my
view, many of the most fundamental issues in visual perception that
are not addressed by alternative approaches to this problem.
Collinear Boundary Completion
We begin by modeling collinear boundary completion as in the Kanizsa
figure.
Psychophysical studies
have shown that the illusory contour in this figure exhibits an
elastic spline-like quality in response to misalignment of the
inducers. This suggests an
analog Gestalt model
involving local interactions in some perceptual substrate.
This idea has been quantified in the
Directed Diffusion model
of collinear boundary completion which posits that visual edges
stimulate oriented fields of activation diffusing outward in a
collinear manner. When two such edges are aligned in a collinear
configuration, their diffusing fields of activation interact
spatially, resulting in a positive feedback which enhances the
illusory boundary along the smooth curve joining the inducers as shown
schematically below.
Computer Simulations
of the Directed Diffusion model confirm that it reproduces the
properties observed in the psychophysical studies.
Immediately we can see that this operation is not abstraction, but the
inverse of abstraction, or
[reification],
a filling-in or interpolation of information not explicitly present in
the image. Even at this level we can begin to see exactly what it is
that is missing from the conventional models of vision.
Non-Collinear Boundary Completion
The mechanism of the Directed Diffusion model is specialized to
perform collinear completion. However not all illusory contour
formation is collinear in nature. In the examples shown below, we see
the illusory contour complete through the sharp vertices of the
triangle on the left, we see the contour bifurcate in a series of "Y"
intersections at each dot in the pattern at the center, and we see an
illusory contour form orthogonal to, rather than collinear to
the inducing lines of the Ehrenstein figure on the right.
A careful analysis of the behavior of the illusory contour in these
and other figures reveals certain underlying
laws of non-collinear completion
which offer clues as to the mechanism by which such completion is
subserved.
The
Orientational Harmonic model
generalizes the mechanism of the Directed Diffusion model in order to
account for these phenomena, and
computer simulations
of the model confirm that it predicts the illusory grouping in a wide
range of diverse phenomena.
In order account for the data, the Orientational Harmonic model
introduces a novel computational mechanism in the form of harmonic
resonances that create patterns of standing waves which promote
the Gestalt properties of symmetry and periodicity.
As well as performing the non-collinear completion described above, Harmonic
resonance also exhibits some unique and interesting properties as an
invariant representation
of visual form in visual abstraction.
The Surface Brightness Percept
This model is not complete as it stands, because there are two
components to the percept, a collinear percept of an edge, and a
surface percept of the occluding triangle, which even has a different
surface brightness than the white page that it occludes. A complete
perceptual model must fill-in this surface brightness.
Grossberg proposes the
Feature Contour System (FCS),
a mechanism which uses a spatial diffusion of brightness percept to
account for a wide variety of psychophysical phenomena including the
Kanizsa figure. On the basis of information theory I propose a
hierarchy of representation
in the visual system in which the FCS brightness percept represents
the lowest level.
The Depth Component
Several investigators of the Kanizsa illusion have noted that there is
more than just a surface brightness percept, but also an illusion of
occlusion by a figure closer in depth of figures farther in depth; the
triangle appears to "hover" some distance above the page. Indeed
certain illusory phenomena
contradict boundary and surface diffusion models without a provision for
an additional depth component. Indeed, there is
evidence
that the phenomena explained by these models occur as easily in depth
as it does in the plane of the visual field. These models must
therefore be either completely abandoned, or boldly extended into a
full depth representation. As absurd or improbable as this might at
first seem, I will show that this explanation makes sense of a great
number of hitherto mysterious perceptual phenomena.
The Bubble Model
represents an extension of the previous models into the third
dimension, using the Gestalt bubble concept, whereby the global
configuration of perceptual surfaces emerge by a relaxation of local
forces in an analog dynamic system.
Bounding the Representation
The model so far represents a three-dimensional slice of the external
world in a fully spatial internal model. But the external world is an
essentially boundless space, while the internal representation must
fit inside the finite bounds of the skull.
The Bubble World
model explains how an illusion of boundlessness can be created in a
bounded spatial representation.
Lightness, Brightness, and Illuminance
A number of
perceptual phenomena
suggest that lightness, brightness, illuminance and form perception
are intricately intertwined.
With the addition of an internal illumination source, the Bubble World
becomes
The Lighted Bubble World
which is now able to explain perceptual phenomena never before
addressed by computational models of perception.
Motion Perception and Shape From Motion
Some of the most
puzzling phenomena
in perception are seen in the amazing ability of the visual system to
make spatial sense out of moving stimuli. The spatial matrix of the
bubble model provides a
plausible explanation
plausible explanation for these phenomena.
Neurophysiological Considerations
Finally, a brief discussion of
where in the brain
such a spherical representation might be found.
Conclusion
Many models have been proposed to account for individual aspects of
perception, such as depth from disparity, shape from shading, shape
from motion, etc. with little thought given to generalizing those
models to account for other aspects of perception. This model takes
the opposite approach, and attempts to address all aspects of
perception with a single unified model. Of necessity therefore the
model is described rather sketchily, some of the mechanisms are highly
speculative, and many details remain to be worked out. What is
important however is not so much the specific details of this model,
but rather the functional principles that it embodies. In particular,
I refer to the principles of
- Top-down reification as a complementary operation to bottom-up
abstraction
- Invariance in recognition, specification in completion
- A fully spatial representation of a spatial world
The only reason these properties have not been proposed in
previous models is not for lack of evidence of these
properties in human perception, but rather for lack of a mechanism to
implement these principles. Indeed it is difficult to even
present these concepts without presenting a specific mechanism whereby
they might be implemented. This model therefore is not to be viewed
so much as a specific proposal for the mechanism of perception, but
rather as an example system to illustrate that the above general
properties are achievable in principle with a physical
system.
While it may be argued that the proposed model seems highly
implausible in the light of current understanding of the
neurophysiology of vision, it should also be borne in mind that the
subjective experience of visual perception itself is highly
implausible in the light of current understanding of the
neurophysiology of vision, i.e. it is very hard to imagine how the
simple integrate-and-threshold model of the individual neuron can
possibly lead to the complex phenomenon of visual perception even when
millions of them are wired together. Indeed, no previous model has
even addressed the Gestalt principles with any success. Furthermore,
much of the modeling and analysis in vision science is founded on a
set of often unstated assumptions which themselves have never
been justified any more than have the assumptions underlying this
model.
The principle of Occam's Razor would suggest that the
simplest possible explanation is the one most likely to be
true. If a simple architecture is found neurophysiologically
therefore, the simplest explanation is of a simple functionality. In
biological evolution however there has always been an incentive to
make the most use of any existing hardware, to take
advantage, wherever possible, of subtle, higher order phenomena of the
physical system. By this converse principle of Occam's Beard
therefore, I suggest that when a simple circuit is found in the brain,
the safest assumption is of the most complex functionality
that can possibly be elicited from that simple system.
Return to argument
Return to Steve Lehar