Plato's Cave: The Bubble Model

The Bubble Model

The three-dimensional visual world is projected onto the two-dimensional retina (the optics problem). Perceptual filling-in can be considered as a direct implementation of the inverse optics problem, i.e. expanding the two-dimensional retinal projection into a solid three-dimensional model. In the case of binocular vision this problem is more straightforward because the depth information is explicitly present. Spatial perception however also works on just monocular input, which is a far more formidable and challenging problem. The bubble model is designed therefore to work on either monocular or binocular input, using the same fundamental principles to reconstruct the most reliable three-dimensional percept.

A stimulus of a tri-angular vertex as shown below (left) stimulates a percept of a corner of a cube, or the intersection of three surfaces, either convex or concave, as show below (second figure). Consider the problem in terms of perceptual modeling, i.e. consider the appearence of the percept itself, rather than the neural mechanism which subserves that percept. An analog Gestalt model for this percept might be something like a piece of paper creased in three folds, corresponding to the three visual edges, which break the stiffness or resistance to bending along those edges, as shown below (third figure). This would tend to make the paper "pop" into either a convex or concave corner.

Extending this idea still further, consider the computational mechanism depicted to the right, of a solid block of perceptual substrate which represents a slice of the visual world. Let us suppose that the "cells" or elements of that tissue can be in one of two states, either transparent or opaque, representing the presence or absence of a surface at that location and depth in the visual field. Collinearity and coplanarity constraints can be added to the behavior of this system as local interactions such that when a cell is "active" or "opaque" at a certain depth, and detects neighboring active units, it will tend to propagate its activation to other cells in the same plane as it finds most of its neighbors in, while inhibiting activation in the direction orthogonal to the plane. Given such a system in an initially random state, a smooth planar surface will tend to emerge dynamically, with dynamics similar to a soap bubble. The coplanarity constraint will offer the surface a certain dynamic "stiffness" or resistance to kinking of the surface, so as to make it bend into smooth curves.

A visual edge in a two-dimensional image is constrained in X and Y, but is undetermined in the depth dimension Z. Therefore in the inverse optics projection, the visual edge is inverse projected from a line to a plane in depth, in order to overlie all points in space that represent a possible location of that line in depth. In the example above, the image of the tri-angular vertex is inverse projected from the front (near) surface of the block producing three intersecting planes of influence backwards through the depth of the perceptual block, as suggested by the gray planes in the figure above (right).

In order to generate the proper percept, these planes would have to influence the behavior of active units in the block so as to relax their coplanarity constraint. This way the perceptual surface would have a tendency to kink or fold at locations where it intersects the inverse projection of a visual edge. Additional local interactions could be defined in order to make the surface tend to fold in right-angled corners. In other words, it is possible to define a Gestalt-like analog dynamic mechanism which would respond to a visual input like the one shown above by constructing a three-dimensional reification of the percept. The exact implementation is not as important as the fact that it is certainly possible to define a system with the required properties, and that system would correspond more closely to the actual percept than any of the conventional models proposed to account for vision. The orthogonality constraint, or tendency for the surface to fold at right angles would be a natural property of a system defined in orientational harmonics.

Given a single local edge therefore, the system would tend to pop into one of the configurations shown above: a convex corner, a concave corner, or the edge of one surface occluding another (two possible configurations). Given only the local information, the system has no way of knowing which is the correct interpretation. Given more global information however, the forces in neighboring regions will influence the outcome at this local region as will be described below. The system must therefore be able to "pop" between equally likely alternatives in a multi-stable manner, like the Necker cube until more global evidence is acquired. Indeed the multi-stable nature of percepts like the Necker cube provide strong evidence for this kind of reification mechanism.

Collinear Completion

In the Directed Diffusion model I introduced the notion of edges propagating in a collinear fashion by a spatial diffusion mechanism. In this three-dimensional extension to that idea, the edge is more like a fold or crease in a physical surface. The dynamics must be arranged so as to propagate such creases in a collinear manner through regions of no input. For example an edge which fades away will create such a crease which will tend to propagate in the manner of a crack in a physical surface, as suggested below (left). The propagation of this crack will be greatly influenced by contextual cues, for example the presence of a crack growing from the opposite direction, or perhaps the presence of a texture of multiple visual edges which will weaken the perceptual surface in a certain direction. A visual edge which terminates abruptly on the other hand suggests an occluding surface. again, this behavior can be programmed into the model by defining the right set of local dynamics, as shown below (right), where the strong edge directly adjacent to a strong absence of an edge creates a local stress in the perceptual surface which would tend to make the crack propagate behind the occluding surface, thus elevating it up away from the original surface. The other boundaries of this surface however remain unconstrained, and therefore produces no percept of its other edges.

The presence of an abrupt line ending on the other side of this figure however would tend to complete the propagation of the crack behind the occluding surface, which would tend to further elevate the occluding surface over the occluded edge, while producing an amodal percept of an edge occluded by a nearer surface, as shown below (left). In the presence yet of another pair of abruptly terminating vertical lines, the occluding surface will be surrounded on four sides, which will induce it to pop into the foreground as a separate surface, while producing an amodal percept of an occluded cross behind that surface. This would explain the appearence of the Ehrenstein figure shown below (right), as well as providing an explanation for the Gestalt principle of closure.

The exact dynamics and details of this model have not been fully worked out. But the principle behind this mechanism is clear, and it is also clear that such a system is possible in principle to define. It is also clear (at least to me) that this kind of a system captures the subjective impression of these illusory figures in a very compelling way.

Return to argument

Return to Steve Lehar