Plato's Cave: Inverse Perspective Transformation

The Inverse Perspective Transformation

The laws of perspective transform the parallel rails of a railroad track into converging lines that meet at the horizon. The job of the visual system is to undo this transformation and to restore the percept of parallel tracks, while giving the viewer information about the angle from which those tracks are being viewed.

The architecture of the bubble world model has the property of automatically performing this inverse perspective transformation. Generally speaking, this is achieved by applying the same perspective transformation to the infinite Euclidean grid or Cartesian coordinate system, and then using this "bent ruler" as a measure of straightness in the perspective-distorted world.

In particular, this is how it works. We begin by inverse-projecting the image of the converging rails onto the front face of a slice of the perceptual mechanism as described in the bubble model, which produces an inverted "V" shaped "peaked roof" of influence through the depth dimension as shown below. The difference in the bubble world model is that the texture of neural connectivity, or direction along which collinearity and coplanarity operations tend to diffuse is no longer rectilinear, but follows a nonlinear pattern. In particular, the back plane represents perceptual infinity. This block of perceptual tissue therefore represents a rectangular slice from the periphery of the perceptual sphere, viewed from the inside. In order to describe the direction of this texture of connectivity within this block, consider the set of planes a through g below, which pass through a horizontal line at the vertex of the inverted "V". Planes a and g follow the back surface of the block, and thus represent perceptual infinity, while plane d runs horizontally, and passes through the "eye of the observer" at the center of the perceptual sphere.

The figure below represents the texture of connectivity within each of these planes, depicted as shown above, i.e. viewed from "below" for planes a through c, and from "above" for planes d through g. You can see that the texture defines a set of converging lines, but the angle of convergence varies from plane to plane.

These converging lines represent the expected texture of convergence of parallel edges on a surface viewed from various angles. For example, plane e represents the perspective view of a horizontal plane as seen from slightly above, like a tiled floor viewed in perspective, whereas plane f represents that same horizontal plane viewed from a higher vantage point. Planes b and c represent the view of a surface from below, like an overhead tiled ceiling. Planes a and g represent the extreme case of a tiled wall seen at infinity, i.e. like a birds-eye view with no perspective distortion, while plane d represents the singularity of a plane viewed from within the plane, i.e. everything is collapsed into a single horizon line.

Now in fact this planar view of the connectivity in the perceptual sphere is somewhat of a simplification, because horizontally within each plane there are many such vanishing points, as suggested below (left), and vertically there are many such sets of planes, as suggested below (center), so the true situation is something like the figure below (right), i.e. the external surface of the perceptual sphere is a continuum of intersecting vanishing point vertices, although it is to be remembered that this surface is actually spherical, so the central axes of all of these vertices actually meet at the center of the perceptual sphere. Each of these vertices therefore represents the Cartesian grid rotated so that a major axis passes through that point, and thus represents the perspective view of that grid at that orientation as viewed from the origin.

Now that I have defined more precisely the connectivity architecture of the perceptual sphere, it becomes clear how this architecture automatically performs the inverse perspective transformation. For when a pattern of converging lines is inverse-projected through this structure, it will coincide with only one of these many sets of converging lines. In our example, of all the multiple sets of converging lines in the system, only those in plane e match the angle of convergence of the rails in the image.

An interesting feature of this system is that besides simply abstracting this geometrical information from a pair of lines, the system then tends to reify the surface suggested by those lines. In this case the stimulus of the converging lines in plane e will tend to propagate coplanar activation throughout plane e, thus filling-in every point in the surface suggested by those lines. The filled-in surface will automatically be represented at the correct distance and orientation relative to the egocentric point, at the center of the perceptual sphere. Any additional visual cues which are consistent with the perceived surface would strengthen that percept and give it a more precise perceived form and place, while any cues which were inconsistent with it would tend to suppress the percept of that surface.

This mechanism now offers an explanation for the Ponzo illusion, where the two vertical lines appear to have different lengths, due to the presence of the converging lines which suggest a depth interpretation.

The Ponzo Illusion

This mechanism also explains the Mueller / Lyer illusion where the vertical lines which are of the same length appear to be of different lengths. Again this illusion makes sense in a spatial context, because the line which is made to appear farther in depth becomes perceptually larger, while the one that is made to appear closer in depth becomes smaller.

The Mueller / Lyer Illusion

Return to argument

Return to Steve Lehar