Plato's Cave: Invariant Representation

Bottom-Up Invariant Recognition

An interesting relationship can be seen in the orientational harmonic system in the relationship between the spatial pattern of orientations produced by the standing waves in the system, and the corresponding temporal frequencies that result from such standing waves.

The temporal frequency of a standing wave is equal to the wavelength divided by the speed of sound in the medium, which is a constant, so that the temporal frequency is proportional to the wavelength. A fourth harmonic standing wave for example will produce twice the temporal frequency of a second harmonic which in turn is twice the frequency of the first harmonic waveform. The significance of this relationship is that the temporal frequency is a non-spatial quantity, represented by a single value, while the corresponding standing wave is a spatial pattern with a spatial, or orientational frequency, and a phase, or absolute orientation. A single temporal frequency in the orientational harmonic system therefore represents all the possible orientations of the corresponding orientational frequency pattern in a many-to-one manner.

Consider a bank of tuned filters set to respond specifically to the harmonics of a circular harmonic system. The second harmonic filter for example would respond equally to a second harmonic pattern at any orientation, as shown below. These tuned filters therefore represent a rotation invariant representation of the pattern of standing waves in the system.

Since the patterns are themselves expressed in terms of angles, and angles by their nature are invariant to scale, the orientational representation is fundamentally invariant to changes in scale, so again, the system has a natural scale invariance property.

Finally, translation invariance can also be added to the system by replicating the harmonic units in a spatial array, and arranging global tuned filters which can receive input from all of the units in the array simultaneously. The presence of a third harmonic vertex anywhere in this array would therefore stimulate the third harmonic filter in a rotation, translation, and scale invariant manner.

Top-Down Specific Completion

An even more interesting feature of this system is seen by replacing the bank of tuned filters by a bank of oscillators, which can be made to broadcast a temporal frequency corresponding to any of the patterns of standing waves. This will stimulate a standing wave in the resonant system, i.e. the system can be made to "hallucinate" that pattern as if it were detected in the input.

Because of the many-to-one invariance relationship between the multiple rotations of a pattern and its single tuned oscillator, such top-down priming does not specify the rotation of the resultant pattern, but rather stimulates all rotations simultaneously. In the presence of a weak or partial input on the other hand, the emergent pattern will rotate to the orientation which best matches both the bottom-up input and the top-down prime. This will become clear in the following computer simulation.

The figure below represents a single harmonic unit, and the black arrows represent an input pattern composed in this case of two edges in a near-collinear vertex. The gray shading represents the response of the harmonic ring to this input, as calculated in the simulation. The slider bars to the right of the ring represent the response of the first few harmonics of the system to this input pattern. These sliders thus correspond to the tuned filters. The pattern of these sliders indicates a strong response to the second harmonic, and a weaker response to the third harmonic. The sliders at the far right represent the top-down priming signal corresponding to the tuned oscillators for top-down priming. In the actual system the filters and oscillators would be combined in a single unit, but are shown separately here to separate the bottom-up from the top-down signals. In this case all the top-down sliders are set at the same value, indicating uniform priming of all harmonics.

Given the same visual input, the system was then progressively primed by reducing the second harmonic while boosting the third harmonic prime. At first little change was observed in the system, but then at a certain critical priming value, the system suddenly cascaded into the state shown below, where the second harmonic response dropped dramatically, while the third harmonic rose. At the same time a change was seen in the harmonic ring as an illusory third branch was "hallucinated" from the input pattern. What is interesting in this simulation is that while the top-down prime was non-specific to orientation, the result of this prime was very specific to orientation, making a precise prediction of where a third branch should be expected if this input were interpreted as a three-way, rather than a two-way vertex.

This property of generalized recognition with many-to-one compression from many instances to a single invariant code, combined with specific completion with a one-to-many, or rather one-to-one-of-many specification of the single variant which best matches both the bottom-up input and the top-down prime, is a natural property of harmonic resonance, and as far as I can tell, it is unique to harmonic resonance. This concept will be elaborated further in the example below.

Consider an array of harmonic units on which is projected a visual input, represented by the black lines below, which produces a response in the harmonic units, indicated by the gray regions. A bank of tuned resonators behaving either as filters or as oscillators, are arranged so as to receive input from, or broadcast output to the entire array simultaneously.

If a third harmonic pattern, i.e. a three-way vertex were presented to this system, the harmonic units would produce a characteristic third harmonic tone in temporal frequency, which would stimulate the third harmonic resonator regardless of the location, the rotation, or the scale of the pattern, as shown above. In other words the tone remains constant through any rotations, translations, or scaling of the input pattern. This represents a many-to-one transformation from an infinite variety of instances to a single invariant feature which is detected by the tuned filters.

In the top-down mode, the resonator can behave as a tuned oscillator that transmits the third harmonic acoustic frequency and broadcasts it to all the harmonic units in the array. Each unit in the array would attempt to "hallucinate" a triangular vertex, but the overlapping and intersecting signals would cancel each other out, resulting in an approximately uniform pattern of activation, as suggested below.

A visual input applied during such priming would break the symmetry of this mutual inhibition, and triangular stars would begin to appear. Given an input in the form of a single dot, for example, this input would favor triangular stars of all orientations equally at that location, so that with sufficient top-down priming the system would hallucinate the primed feature centered at the dot, at some arbitrary orientation, as shown below. In this case only the location of the primed pattern is constrained by the input, so that the other degrees of freedom remain arbitrary.

If the input supplied is more specific, then the resultant pattern will be further constrained, as shown below, where the input consists of two oriented edges that meet at a vertex. The minimum energy configuration for this input is to have two arms of the third harmonic response coincide with these lines, leaving the third to bisect the external angle with an illusory boundary. This illusory filling-in or completion of the expected form would respond to rotation, or translation of the partial input so as to always perform completion in the proper location and orientation.

Return to argument

Extension to a Hierarchical Representation

The invariance in recognition, and specificity in completion demonstrated above occur through transformation across one level in a hierarchy. The concept can be imagined to extend to multiple levels in a hierarchical representation, preserving the invariance at each level. If the orientational harmonic representation described above represents geometrical form by way of symmetries (= orientational periodicities) between visual edges, then the next higher level in the visual hierarchy might encode symmetries of these symmetries. For example a triangle might be encoded as a three-way arrangement of vertices, whereas a square represents a four-way arrangement of vertices about a center of symmetry. The next figure suggests how such a system can be imagined to operate, to represent a square in a rotation, translation, and scale invariant manner. The square is defined by four right angled "L" vertices arranged in a quadrilaterally symmetrical pattern. This central symmetry is represented in the higher level by an array of orientational harmonic units as described above, responding to a fourth harmonic tone, resulting in a pattern of activation in the form of a four way vertex, or cross.

This pattern of activation propagates in turn down to a similar lower level representation, but this time representing vertices rather than edges. A four-way pattern of vertices suggests a square, and would stimulate boundary and vertex completion at the lower level between the four vertices to form a square, although the spatial scale of the square would be undetermined. Competition between the alternative squares at different scales would suppress the alternatives as soon as one instance becomes dominant. Notice how the central symmetry of the square can occur at any location and orientation, and the pattern of orientations that it generates at the lower level is compatible with squares at any scale. The appearance of any instance of a right angled vertex in the input would anchor the pattern at that vertex, fixing the approximate orientation of the whole square, but allowing for a considerable amount of flex in the exact location of other corners of the square. The pattern shown above (right) for example is sufficiently similar to a square as to be able to activate the generalized square node, because it has the same central quadrilateral symmetry, which connects flexibly to four vertices, that in turn connect to each other by way of flexible boundaries. This model suggests that the lower-level details of the corners, edges, and surfaces of shapes are not be encoded at the higher level, but are left for the lower levels of the hierarchy to complete. Each level of the hierarchy therefore encodes only the information pertinent to the representation at that level, but unlike other proposed schemes of visual abstraction, a top-down reification allows for a transformation of any high level signal into a low-level instantiation of that invariant form.

Return to argument

Implications

This model raises several interesting issues of relevance to visual perception and representation. In many visual models, features are extracted from the input by specialized feature detectors, the response to each feature being represented by the activation of the corresponding node. In models that involve competition or cooperation between features, that interaction is often implemented by lateral inhibition or excitation between these higher level nodes. The system proposed here is fundamentally different in the sense that what competition or cooperation exists in the model, occurs not by way of explicit interactions between high level representations, but rather occurs node by node at the lower level of representation by way of constructive and destructive interference between competing waveforms in a distributed representation. Indeed, that interaction between competing representations occurs independent of the higher level representation, and can be seen as a low level consistency matching at the highest possible resolution in the system between alternative interpretations of the input signal.

The manner in which the top-down priming influences perception in the system is also noteworthy. In the computer simulation presented above, the ambiguous input pattern stimulated both the second and the third orientational harmonics, leading to an activation of the second and third harmonic filters. In a purely feedforward system, this higher level abstraction of the lower level pattern would be the only information available to a recognition or decision process. But the dimensionality of this abstraction, one value for each encoded pattern, is far more impoverished or compressed than the intricate balance of competing waveforms in the lower level cooperative representation. Indeed the interactions between the multiple waveforms present in that representation is not reducible to a single dimension without a great loss of resolution. During the dynamic simulation, when the top-down bias favoring the third harmonic is first applied, the interactions between waveforms in the cooperative representation was observed to slowly bend the activations of the two branches of the bilateral pattern against their natural tendency to remain colinear, in an attempt to make room for a third branch, even though the bottom-up second harmonic signal still dominated the third harmonic signal. A sudden change was seen to occur as soon as an illusory third branch sprouts, after which the system quickly re-equilibrates to a third harmonic condition as the third branch grows in size and the bottom-up feature representation reflects a predominantly third harmonic pattern. In this system therefore, the priming of the third harmonic resonator initiates a fundamental shift in the balance of forces at the lower level cooperative representation, causing the pattern there to bend as if under stress, as the third harmonic attempts to establish a third branch of activation.

The success of this priming on changing the final percept depends intimately on the context at the lower level, whether the rest of the inputs at that location, and at neighboring locations in a full simulation, are also consistent with this new trilateral pattern. In a full simulation complete with neighboring units and oriented input, the bending of the two arms of the bilateral pattern would subtly shift the forces in all the neighboring cooperative units, resulting in a different dynamic balance of forces throughout the cooperative layer. This new balance of forces could either promote the growth of the third branch (by increasing the activation at the two and three o'clock orientations), or inhibit it, in a way that would be impossible to predict by analysis of the high level representation alone. The manner of operation of the top down priming in this system therefore is not in the form of an "intellectual" competition between the high level abstractions represented by the second and third harmonic resonators, but rather, the competition is more of a "physical" struggle between the low level embodiments of these high level nodes, in the context of the lattice of interacting forces in the cooperative layer. The top level nodes communicate the urgency, and monitor the outcome of the competition between representations, but the competition itself takes place at the lowest level between the waveforms themselves. In this system therefore, the top down priming is a highly context sensitive operation.

Return to argument

Return to Steve Lehar