A number of complex and subtle properties observed in a variety of visual illusions can be explained by a single simple mechanism involving harmonic resonances in a dynamic neural network architecture formulated by the Gestalt principle of global emergent properties from simple local interactions. The proposed mechanism of these harmonic resonances involves the synchronous firing of cells in a neural syncyctium mediated by gap junctions or electrical synapses.
Visual illusions offer an invaluable tool for exploring the mechanism of visual perception. Illusory lines and edges appear predictably in response to certain configurations of visual inducers. For example, the illusory sides of the Kanizsa triangle shown in Figure 1(A) appear in response to the short oriented edges at the corners of the triangle due to a colinear alignment of those edges. An illusory boundary also appears when the alignment of the inducing edges is only approximate, as seen in the curved Kanizsa triangle in Figure 1 (B), although the salience of the illusion is somewhat diminished as a result. A number of parametric studies have been conducted to examine the perceptual salience of the illusory boundary in response to various factors. These studies have shown that the salience of the illusory contour is a function of the length of the inducing edges relative to the separation between them, [1] the contrast of the inducers relative to the background, [1] and the spatial alignment of the inducing edges [11].
A: The Kanizsa triangle, and B: the curved Kanizsa triangle exhibit illusory boundary completion by colinearity. C: The illusory boundary can survive a significant bending mis-alignment, but is particularly sensitive to a shearing mis-alignment shown in D. E: relatable inducing edges, and unrelatable edges due to F: acute angle, and G & H: non intersection of the linear extensions to visible edges.
Psychophysical experiments by Kellman and Shipley [11] have revealed a distinction between a bending mis-alignment as shown in Figure 1 (B) and (C), and shearing mis-alignment as shown in Figure 1 (D). While both forms of mis-alignment reduce the salience of the illusory contour, the contour can survive a considerable bending mis-alignment, but is far more sensitive to a shearing mis-alignment. Kellman and Shipley formalized these findings in a mathematical model based on the linear extensions of the visible inducing edges. They propose that edges whose extensions intersect at an obtuse angle are relatable, i.e. capable of forming an illusory contour, while edges whose extensions either fail to intersect, or do so at an acute angle are non-relatable. For example the inducers in Figure 1 (E) are relatable, while those in Figure 1(F), (G), and (H) are not, because in Figure 1 (F) the linear extensions intersect at an acute angle, while in Figure 1 (G) and (H) they do not intersect. Note that for the purposes of this definition, the linear extension is only defined beyond the inducing edge, not within it. While this model captures some of the properties of illusory contour formation, it has a hard Boolean character which is not observed in the illusory phenomena. For example the illusory contour does not disappear abruptly as the angle between the linear extensions exceeds 90 degrees, but falls off smoothly as a function of that angle, approaching zero magnitude at about 90 degrees. Similarly when the inducers are parallel as in Figure 1 (G), the salience of the illusory contour falls of smoothly (although rapidly) with a shearing mis-alignment, even though the edges are only strictly relatable when they are perfectly aligned. Furthermore, Banton and Levi [1] show that the salience of the illusory contour falls off smoothly with separation between the inducers, and as a function of the length of the inducing edges, even when they remain mathematically relatable.
These smooth analog properties of the illusory contour suggest a Gestalt field-like influence between inducers rather than the hard mathematical nature of the relatability criteria. For example the linear extension emanating from the inducer might be expressed as a Gaussian field of influence which extends in a colinear manner from the inducing edge, as suggested schematically in Figure 2 (A). According to Gestalt theory, the final configuration of the illusory contour reflects a global emergent pattern from local interactions, like the global spherical shape of a soap bubble which emerges by an relaxation of the local forces of surface tension. A Gestalt model of illusory contour formation therefore would involve a dynamic relaxation of multiple local forces represented by the influence of the inducing edges acting through their smooth Gaussian fields. Figure 2 (B) suggests how the fields of influence from adjacent inducing edges might interact spatially across the featureless space between them, resulting in the formation of an illusory contour that bends to conform to the geometrical arrangement of the inducers.
A: A Gestalt field-like projection from visible edges may account for illusory contour formation by spatial interaction as suggested in B. C: Directed Diffusion architecture where colinear boundary completion occurs by a directed diffusion of oriented activation in a cooperative cell layer by way of spatial receptive fields of the form shown in D. E: A refined form of the receptive field to perform spatial sharpening.
The Directed Diffusion model [12] provides a computational implementation of this Gestalt concept in a neural network architecture. In this model, which is an extension of the Boundary Contour System, [6] oriented edge detectors respond to the presence of oriented edges in the input by spatial receptive fields in the manner of cortical simple cells reported by Hubel and Wiesel [8]. The response of such a cell computed as an image convolution is proportional to the contrast across the edge. Figure 2(C) illustrates two horizontal oriented cells responding to two horizontal edges in the Kanizsa figure by way of oriented receptive fields. Similar edge sensitive cells representing edges of all orientations (not shown) are also present at every spatial location in the system. The spatial interactions responsible for illusory boundary formation occur in a higher level cooperative cell layer, which also contains cells representing all orientations at every spatial location, and these cooperative cells receive input from the corresponding oriented cells in the oriented layer. In Figure 2 (C) for example the activation of the horizontal oriented cells stimulates horizontal cooperative cells at those same locations. Neural activation diffuses in the cooperative layer in an orientation specific manner by way of bipolar receptive fields, whereby cooperative cells receive input from other like-oriented cells in the cooperative layer through the cooperative receptive fields. For example in Figure 2 (C) each horizontal cell in the cooperative layer has a pair of horizontally oriented receptive fields which receive input from other horizontal cooperative cells which are horizontally adjacent to them. Figure 2 (D) illustrates a possible form for the cooperative receptive field, which can be defined by a Gaussian function of radial distance from the cell, as well as a Gaussian function of angular deviation from the line of colinearity. Cooperative cells which are horizontally adjacent to the active cells depicted in Figure 2 (C) will receive input from the active cells. This secondary activation will continue to propagate to other cooperative cells still further from the input in a colinear direction. A passive decay term in the differential equation governing the cooperative cell prevents runaway positive feedback so that the pattern due to an isolated input will define at equilibrium a spatially decaying trail of activation colinear with the original edge, and extending outward to a distance which is a function of the magnitude or contrast of the original edge. The equilibrium pattern of diffusion from an isolated edge signal would therefore appear similar in form to the Gaussian receptive field shown in Figure 2 (D) except that the final range of diffusion would be much greater than the size of any individual receptive field, as suggested in Figure 2 (C), and the range would be greater still at locations between colinear oriented inducers where the cooperative cells would receive activation from both sides simultaneously. The subjective appearance of the illusory contour in the Kanizsa figure however is not a broad diffuse region as suggested in Figure 2 (B) and (C), but looks more like a sharp well defined edge. In order to achieve this result, an element of spatial sharpening was incorporated in the Directed Diffusion model by the addition of inhibitory sidelobes to the cooperative receptive fields. This was achieved by defining a difference of Gaussians profile as a function of angular deviation as shown in Figure 2 (E), rather than the straight Gaussian profile of Figure 2 (D). This has the effect of boosting the strength of the illusory contour along the crest of greatest magnitude, and suppressing it to either side of that crest. At every spatial location a certain cross-talk between adjacent orientations was also defined in the model, resulting in an additional diffusion of activation across orientations at each spatial location. This orientational cross-talk results in a fanning-out of the diffusing signal, as suggested in Figure 2 (A). This feature allows boundary completion to occur around smooth curves, as shown in Figure 2 (B), since the cooperative cells along the curve receive activation from both sides simultaneously, and thus become more active than cells along either line of colinearity. The strength of the resulting illusory contour will however be somewhat diminished, as is seen in the psychophysical studies.
Computer simulations of the Directed Diffusion model [12] show that it can reproduce all of the properties of illusory contour formation discussed above. The salience of the illusory contour is a function of the length of the inducing edge because of mutual support between adjacent cooperative cells along that edge, and is an inverse function of the separation between inducers because of the finite extent of the Gaussian receptive fields which causes the field of activation from an inducing edge to decay with distance from that edge. The salience is also a function of the contrast across the inducing edge, since a greater contrast generates a stronger signal in the oriented edge sensitive cells, and this model also exhibits the properties of the relatability criteria of Kellman & Shipley [11] because the spatial interaction of fields of activation from adjacent inducers approximates the intersection of the linear extensions of the inducing edges. Figure 3 shows the results of computer simulations of some of these effects. In these simulations the inducing edges were approximated by pointlike oriented edge signals at specific locations and orientations, indicated in the figure by a barred circle. The activation in the cooperative layer at equilibrium in response to these inputs is indicated by the gray shading in the computer simulation outputs, which depicts the total cooperative activation for all orientations at each spatial location. The simulations show that each oriented edge signal generates a strong point of activation in the cooperative layer at the location of the oriented input, (seen as a dark central dot) together with a weaker extended field of activation parallel to the orientation of that edge. Adjacent fields of activation interact spatially, resulting in the formation of an illusory contour which joins the inducers with a smooth spline-like curve. The magnitude of the illusory contour is shown to be a function of inducer separation and alignment, with a large tolerance for bending mis-alignment, and a small tolerance for shearing mis-alignment. The exact tolerance for each kind of mis-alignment can be adjusted independently to match the properties observed in the psychophysical studies, as can the range of diffusion by adjustment of either the size of the receptive field, or the magnitude of the decay constant. The simulations shown in Figure 3 reproduce only the strength and geometrical form of the illusory contour, not the brightness percept associated with it. Grossberg has shown that the brightness percept can be reconstructed from this kind of boundary image by a spatial diffusion mechanism known as the Feature Contour System, [7] which performs a two-dimensional filling-in of the brightness signal. The dynamic diffusion in the Feature Contour System is somewhat analogous to the filling-in of the boundary signal in the Directed Diffusion model, and has the same Gestalt characteristics of global emergent properties from local interactions.
Computer simulation of the Directed Diffusion model showing how the salience of the illusory contour is influenced by A: spatial separation, B: a bending mis-alignment, and C: a shearing mis-alignment between inducing edges.
The Directed Diffusion model illustrates how Gestalt principles can be embodied in a dynamic neural network architecture which allows quantification of the analog properties observed in illusory phenomena. The key component of the Directed Diffusion model responsible for colinear boundary completion is the bipolar cooperative receptive field, which causes oriented activation to diffuse in a colinear manner. There are a number of visual illusions however which exhibit boundary completion in a non-colinear manner through sharp vertices. This is seen at the corners of the illusory triangle shown in Figure 4 (A), where each illusory vertex is defined by the intersection of two illusory edges. In Figures 4 (B) and (C) the vertices are defined by the intersection of three and four illusory edges respectively at each dot. Figure 4 (D) illustrates the size / spacing constraint discussed by Zucker [16]. The vertical grouping in this figure can be explained by the Directed Diffusion model because a small circular dot stimulates oriented edge detectors of all orientations equally, which allows colinear completion of the vertical component of the local oriented signal. The Directed Diffusion model cannot however explain the observed suppression of a horizontal grouping by the presence of a stronger vertical grouping. A similar distance dependent relationship is seen in Figure 4 (B), where the three-way grouping at each dot suppresses the perception of equally present vertical and horizontal groupings, and in Figure 4 (C) where the orthogonal grouping suppresses perception of an equally valid diagonal grouping. Finally, the Ehrenstein illusion shown in Figure 4 (E) exhibits a grouping which is orthogonal, rather than parallel to the inducing lines. The Directed Diffusion model cannot account for any of these phenomena. Indeed it is the inability of the Directed Diffusion model to handle the condition of multiple orientations at a single spatial location which necessitated the use of point-like oriented inputs in the computer simulations shown in Figure 3, rather than a more realistic input derived from a convolution of an input image with a set of oriented edge filters, because the Directed Diffusion model would receive an ambiguous signal at the corner where the straight inducing edge meets the circumference of the occluded black circle, which would disrupt the propagation of the illusory contour beyond that point.
A number of visual illusions cannot be explained by a colinear grouping, including those involving illusory vertices defined by the intersection of A: two, B: three, or C: four illusory boundaries. D: exhibits a distance dependent relationship where the closer vertical grouping suppresses a more distant horizontal percept. E: exhibits a grouping orthogonal to, rather than parallel to the inducing edges.
An extension to the Directed Diffusion model, the Orientational Harmonic model [12] accounts for all of the phenomena depicted in Figure 4, and many more by way of a single simple mechanism. This model proposes that the cells in the cooperative layer representing different orientations at a single spatial location are arranged in rings, as shown in Figure 5 (A), the cell at three o'clock representing a horizontal edge to the right, nine o'clock a horizontal edge to the left, twelve o'clock a vertical edge above, etc. This arrangement is consistent with the pinwheel model of cortical organization proposed by Braitenberg [2]. Each cell in the cooperative ring receives oriented input directly from the corresponding oriented cell at the same location, as well as indirectly from neighboring regions of the cooperative layer by way of monopolar receptive fields which receive oriented activation from like-oriented cooperative cells at adjacent locations in the oriented direction. For example the cell at three o'clock receives activation directly from the horizontal oriented cell at the same location (suggested by the shaded oriented cell in the figure) and indirectly from horizontal oriented cells displaced in the three o'clock direction by way of a cooperative receptive field which receives cooperative activation from that direction. According to the Orientational Harmonic model, the cells in the cooperative ring are coupled so as to support standing waves of circular harmonic resonance within the cooperative ring. Harmonic resonance, whether mechanical, acoustical, or electrical, is a fundamental property of all physical systems, and has the property of sub-dividing the resonating system into integer numbers of equal intervals of alternating active and inactive regions. For example Figure 5 (B) illustrates the first four harmonics of acoustical vibration in a linear tube, like a flute, where the grey shading denotes regions of high amplitude oscillation. Figure 5 (C) illustrates the first four harmonics of oscillation in a circular resonant system, like a closed circular tube. In an orientational representation such as the one proposed by this model, the patterns of standing waves depicted in Figure 5 (C) represent patterns of edges intersecting at a vertex, as shown in Figure 5 (D). For example the fourth harmonic represents a four-way, or "+" vertex, the third harmonic represents a three-way "Y" vertex, the second harmonic represents a straight-through or colinear feature, while the first harmonic represents a single edge which terminates at the center, or an end-stop feature. There is also a zeroth harmonic which represents edge signals at all orientations equally, which is the pattern seen in response to a small circular dot. The mathematical formulation of the Orientational Harmonic model is described in full in the appendix.
A: Orientational Harmonic architecture where the cooperative cells at each location are arranged in rings which sustain harmonic oscillations, resulting in periodic patterns of alternating active and inactive regions. B: Linear harmonic resonance divides the resonating system into linear periodic patterns, and C: circular harmonic resonance defines circular periodic patterns. In an orientational representation these periodic patterns represent D: various types of vertices defined by the intersection of edges at a point.
The Orientational Harmonic theory makes specific predictions about interactions between local oriented edges, and about the global emergent patterns resulting from specific input stimuli. Consider for example the pattern of vertical lines depicted in the first column of Figure 6 (A). In the following discussion we will consider only the first four harmonics of oscillation, which comprise the most significant response of the system since higher harmonics in a resonating system tend to be suppressed due to higher impedance at higher frequencies. Consider a point at the bottom end of one of the lines in the figure, circled in the magnified view shown in the second column of Figure 6 (A). The pattern of activation in the cooperative ring at that location would consist of a strong input at the twelve o'clock orientation. The first harmonic response to this input pattern would produce a peak at twelve o'clock, and a negative peak, or suppression of activation at six o'clock. The second harmonic response to this same input pattern would attempt to complete the pattern as a colinear edge, with positive peaks at twelve and six o'clock, and negative peaks at three and nine o'clock. The third harmonic response to this same input would consist of peaks at twelve, four, and eight o'clock with negative peaks in between, and the fourth harmonic response would exhibit positive peaks at six, twelve, three, and nine o'clock. Figure 5 (C) illustrates the pattern of these four harmonics. All four of these harmonic responses would be partially stimulated by the input at twelve o'clock, although the first harmonic would find the closest match, and thus would produce the strongest response. The different harmonics are in a dynamic balance with one another by constructive and destructive interference between waveforms in the orientational representation. For example the positive peak of the second harmonic at six o'clock is balanced by the negative peak of the first harmonic at that orientation. Also the negative peaks of the second harmonic at three and nine o'clock balance the positive peaks of the fourth harmonic at those same orientations. An increase in any one harmonic will therefore affect the dynamic balance between all of the harmonics. For example, the presence of a nearby vertical edge at the six o'clock orientation will produce a weak activation at that orientation due to the distant influence through the cooperative receptive fields which, in conjunction with the strong peak at twelve o'clock would promote the second harmonic or colinear grouping percept in the vertical direction. This in turn would suppress the first and the fourth harmonics, as well as the third harmonic, resulting in a predominantly colinear percept in the form of a vertical grouping percept. The third and fourth columns of Figure 6 (A) depict the input and output respectively of a computer simulation of this phenomenon. Unlike the somewhat artificial simulations of the Directed Diffusion model, these simulations were performed with a spatial convolution of the input image using a set of orientation specific edge filters, calculated as described in the appendix. An orientational harmonic ring was calculated at every pixel location, and the fourth column of Figure 6 represents the equilibrium activation of the system for all orientations combined, a dark shade of grey representing regions of high activation at one or more orientations. The darkest shades are seen in regions corresponding to a direct input, for example along the edges of the vertical lines in the third column of Figure 6 (A). Note that a single line in the computer simulation input corresponds to a double line in the simulation output, representing the dark/light and light/dark edges on either side of that line. The lighter shaded lines in the simulation output which do not correspond to edges in the input image represent illusory boundaries or grouping lines stimulated by that input. In the last column of Figure 6 (A) for instance a strong vertical shading is observed, corresponding to the strong vertical grouping percept observed in the figure. Notice also the weaker diagonal and horizontal lines in the simulation output, representing the much weaker third and fourth harmonic responses respectively. By symmetry, the harmonic explanation described for the bottom end of each vertical line applies equally to the top ends of the lines where the same harmonic patterns occur upside down.
Various illusory phenomena and their Orientational Harmonic explanations and computer simulations.
The grouping of vertical lines depicted in Figure 6 (B) promotes a weak diagonal grouping percept, or an illusory zig-zag edge between the line endings. This corresponds to the third harmonic response since the vertex located at the bottom of each line receives distant activation from the four and eight o'clock orientations as shown in the magnified view in the second column of Figure 6 (B), which is consistent with the third harmonic pattern. This in turn suppresses the second and fourth harmonic responses. The computer simulation output for this figure exhibits this diagonal grouping, although the effect is rather weak. This is consistent with the fact that the diagonal grouping percept itself is rather weak.
Figure 6 (C) illustrates a horizontal grouping of vertical lines due to the promotion of the fourth harmonic, because the bottom of each line ending receives input both from the six o'clock orientation and from three and nine o'clock, due to fourth harmonic responses at horizontally neighboring line endings. This simulation illustrates the Gestalt concept of global emergent properties from local interactions, because there are initially no inputs from three and nine o'clock, these inputs are only a secondary effect of an emergent fourth harmonic response at all the line endings simultaneously. Figure 6 (C) shows a computer simulation of this phenomenon exhibiting a strong horizontal grouping percept orthogonal to the line endings. A similar orthogonal grouping emerges from a single set of parallel line endings due to fourth harmonic grouping, as is seen in the Ehrenstein illusion shown in Figure 4 (E). The effect is also seen in attenuated form at the top and bottom of Figure 6 (A), and also Figure 6 (F), although it is much suppressed by the second harmonic at the center of those figures.
A special case of the Ehrenstein figure is shown in Figure 6 (D) which is reported by subjects [3] to produce either an illusory circle, square, or diamond figure. This illusion therefore appears to rest on a saddle point in perceptual space, which can be perturbed in one of three stable directions. The appearance of any one of the illusory figures however precludes the appearance of the other two. The Orientational Harmonic model explains all of these phenomena by way of a competition, or destructive interference between the second, third, and fourth harmonic waveforms. The second harmonic promotes a colinear completion of the illusory contour orthogonal to the line ending, corresponding to the circular illusory percept; the third harmonic promotes a three-way completion at the line ending, corresponding to the diamond percept, and the fourth harmonic promotes a four-way completion at both the line endings and at the corners of the illusory figure, corresponding to the illusory square. The Orientational Harmonic simulation shown in the last column of Figure 6 (D) exhibits traces of all three of these illusory phenomena.
The Orientational Harmonic model also accounts for the illusory grouping of squares, as shown in Figure 6 (E). At each corner of the square, for example at the top right corner circled in the magnified view in Figure 6 (E), the input signal consists of two orthogonal oriented edges, in this case at six and nine o'clock. The first harmonic response to this input produces a peak at the internal bisector of the two edges, i.e. at 7:30 o'clock, which corresponds to the center of the half-circle containing the greatest oriented signal. The second harmonic would produce a zero response to this input, since its positive and negative peaks are separated by exactly 90 degrees. The third harmonic would attempt to align optimally with the oriented input, with two peaks centered at 5:30 and 9:30 o'clock, leaving a third peak to define an illusory outward projection diagonally at 1:30 o'clock, and the fourth harmonic would align two of its peaks with the six and nine o'clock inputs, leaving two illusory outward projections at twelve and three o'clock. The Orientational Harmonic model therefore predicts illusory grouping lines projecting outward from a square in orthogonal and diagonal directions, as shown in the second column of Figure 6 (E), corresponding to the fourth and third harmonics respectively. These two harmonics however compete with each other by destructive interference, because the diagonal corner projection of the third harmonic corresponds to the negative peak of the fourth harmonic between the orthogonal edge projections. An increase in the strength of the third harmonic would therefore suppress the effect of the fourth harmonic, and vice-versa. This competition is seen in the illusory grouping percept of Figure 6 (E), where the orthogonal alignment of adjacent squares boosts the fourth harmonic signal at each corner, promoting an orthogonal grouping percept, which in turn suppresses the percept of a diagonal grouping. Additional computer simulations (not shown here) reveal that the removal of alternate squares from this figure generates an emergent diagonal grouping percept, which in turn suppresses the perception of an orthogonal grouping.
Finally, Figure 6 (F) demonstrates the size/spacing constraint discussed by Zucker, [16] whereby a closer vertical spacing of dots suppresses an equally valid horizontal grouping percept, due to destructive interference between the second and fourth harmonics, because the horizontal grouping lines of the fourth harmonic at three and nine o'clock are suppressed by the negative peaks of the second harmonic response at those same orientations. Lehar [12] shows that removal of alternate rows of dots in this figure restores the horizontal grouping by disinhibition, at the expense of the vertical grouping, even though the horizontal dot spacing remains unchanged. The distance-dependant grouping phenomena seen in Figure 4 (B) and (C) are also explained by harmonic interactions, and have been reproduced in computer simulations [12].
None of the visual phenomena presented in Figure 4 and their replication in computer simulations provide conclusive proof of the Orientational Harmonic model. The fact that this model explains all of these diverse phenomena with a single simple mechanism however makes a strong case for Orientational Harmonics as a mechanism in visual perception. A number of the phenomena shown here have been addressed individually by different models, but no model has yet even attempted to account for such a diversity of phenomena with a single model. The phenomenon of colinear boundary completion is the most straightforward effect, and has been modeled for example by Grossberg's Boundary Contour System (BCS), [6] Walter's Rho Space, [13] and Zucker's curvature operators [15]. All of these models have as their central mechanism some variation of the colinear edge detector cell, which responds to a colinear (or cocircular) arrangement of local edge responses of oriented simple cells, in the manner of the Directed Diffusion model. Some of these models have incorporated some kind of cooperation or competition between orthogonal orientations at each spatial location in order to account for orthogonal end-cut effects as seen in the Ehrenstein illusion of Figure 4 (E), but this kind of model can never account for the generalized vertex completion phenomena seen in Figure 4 (A), (B), and (C). In The original BCS paper [6] Grossberg proposed that the cooperative cell with bipolar receptive fields might also occur in variations with angles other than 180 degrees between the two lobes- for example "L" vertex and "V" vertex detectors with 90 degrees and 30 degrees between the two lobes would account for right angled and acute angled completion. Other cells with three or more lobed receptive fields might account for completion across "T" and "X" vertices. The requirement to have a specialized cell for every vertex configuration however leads to a combinatorial problem, as each specific cell type would have to exist at every orientation at every spatial location in the cortex, and appropriate cooperative and competitive interactions defined between different feature cells. This line of reasoning was never elaborated into a complete theory. A similar combinatorial problem is encountered in Zucker's cocircularity detector cells [15], where "cooperative" cells specialized for edges of every curvature must be replicated at every orientation and every spatial location in the model. This model would suffer further combinatorial problems when extended to vertices composed of more (and less) than two edges. Wilson and Richards [14] present psychophysical evidence that colinear completion gives way abruptly to vertex completion at a particular curvature, which they proposed was evidence for two different mechanisms of curve detection. The same phenomenon was discussed by Kanizsa [10], who illustrated the effect with illusory curves composed of lines of dots, showing that the colinear grouping percept gives way to a sharp vertex grouping at about the same curvature observed by Wilson & Richards. Lehar [12] shows that this effect can be explained by a transition from the second to the third harmonic in the Orientational Harmonic model.
The Orientational Harmonic model provides evidence for some kind of harmonic resonance occurring within the visual system. The exact nature of this mechanism remains an open question, but since harmonic resonances are observed in all physical systems in the form of either physical or electromagnetic vibrations, a number of possible explanations suggest themselves. One possibility is that the cells in the orientational ring transmit a physical impulse with each spiking discharge, and that cells receiving such an acoustical pulse would tend to respond by spiking themselves. A group of nearby active cells would thereby tend to fire in phase with one another by an acoustical resonance in a circular acoustical cavity. This explanation would require some physical mechanism to channel the acoustical wave around the orientational ring, so that the signal from the cell at three o'clock for example could not propagate directly to the one at nine o'clock without first passing through the cells at the six and twelve o'clock orientations.
A more likely mechanism is suggested by the fact that synchronous firing has been observed in cells which are connected by gap junctions, or electrical synapses, into a single neural syncytium [9]. An electrical synapse is a direct physical connection between the cytoplasm of adjacent cells so that ions can flow directly from one cell to another. Cells connected in this manner thus behave somewhat like a single larger cell, and the speed of communication by way of gap junctions occurs orders of magnitude faster than that through chemical synapses. Conventional wisdom holds that electrical synapses are important only in simple animals and for very simple reflex reactions [9], and play no significant role in higher level computation in more complex nervous systems. Some studies have even suggested that electrical synapses do not occur in the vertebrate brain. Recent work by Dermietzel et al. [4] however suggests that gap junctions have been poorly characterized because they have been difficult to study, and that more recently gap junctions have been found in virtually every cell type including the vertebrate brain, not only in glial cells where they are particularly abundant, but also in neural tissue. Furthermore, Dermietzel et al. [4] show that the degree of coupling is not a static phenomenon, but subject to high plasticity regulated either by developmental or functional factors, including neurotransmitter effects. Although plasticity is not a requirement for the Orientational Harmonic model, this plasticity suggests a hitherto unsuspected complexity and significance of the electrical synapse. Another reason that gap junctions have received less attention is that the simple nature of their transmission suggests that nothing interesting occurs at the electrical synapse besides a direct transmission of the electrical signal, as along the dendrite of a single cell. All of the interesting switching operations underlying neural computation must therefore supposedly result from the gating effects seen at the chemical synapse. The Orientational Harmonic theory suggests how the electrical synapse might indeed contribute to interesting neural computation, not through gating effects applied at the synapse, but through the global harmonic resonance within the syncytium as a whole, made possible by the presence of the electrical synapse.
The Orientational Harmonic theory is further supported by recent findings that synchronous firing between distant cells appears to play a role in visual representation. Eckhorn [5] found that pairs of oriented edge sensitive cells in the visual cortex fire synchronously when responding to edges which are perceived to be colinear, even when that colinearity is illusory as in the case of the Kanizsa figure. When responding to individual non-colinear edges, those same cells fire asynchronously. According to the Orientational Harmonic model, the colinearity relation between adjacent oriented edge signals is represented by a second harmonic oscillation in an orientational ring located between those edges, which would serve to synchronize the neural firing at the local edge detector cells on opposite sides of that ring.
The Orientational Harmonic model accounts for a surprising diversity of subtle and complex perceptual effects, which no previous model has ever even attempted to explain with a single mechanism. The fact that the mechanism proposed by the model is so simple physically lends further credence to the veracity of the model. The fact that this mechanism, harmonic resonances mediated by gap junctions, has never been found neurophysiologically is most likely due to the fact that neurophysiologists have never had reason to look for them, although corroborating evidence has recently emerged from a number of diverse lines research including the ubiquity of gap junctions in all neural tissue and synchronous firing of cortical feature detectors. A great deal of theoretical and experimental work remains to be done before the computational theory of Orientational Harmonics can be united with the single cell neurophysiological findings. The implications of this model for neural computation are considerable however, since they suggest that the simple transmission characteristics of the gap junction do not restrict this type of synapse to simple processing as was hitherto supposed, but that complex and subtle global properties can emerge from such simple local interactions, as suggested by Gestalt theory. If gap junctions do indeed play an important role in neural communication, this would have significant implications for the speed of neural processing, which would no longer be restricted to the speed of chemical transmission, but that complex spatial computation may occur in the brain at speeds which are orders of magnitude faster than hitherto suspected. Indeed this theory suggests that the conventional mechanisms of spiking neurons and chemical transmission may represent only the tip of the iceberg of neural processing.
[1] Banton, T. & Levi, D. (1992) The Perceived Strength of Illusory Contours. Perception & Psychophysics, 52, 676-684.
[2] Braitenberg, V. (1984) Charting the Visual Cortex. In Allen Peters & Edward G. Jones (Eds),The Cerebral Cortex, New York, Plenum Press.
[3] Coren, S., Porac C., & Theodor L. H. (1986) The Effects of Perceptual Set on the Shape and Apparant Depth of Subjective Contours. Perception and Psychophysics, 39, (5), 327-333.
[4] Dermietzel R., & Spray D. (1993) Gap Junctions in the Brain: Where, What Type, How Many, and Why? Trends In NeuroScience (TINS) 16 (5) 186-191
[5] Eckhorn R., Bauer R., Jordan W., Brosch M., Kruse W., Munk M., & Reitboeck J. J. (1988) Coherent Oscillations: A Mechanism of Feature Linking in the Visual Cortex? Biol. Cybern. 60, 121-130.
[6] Grossberg, S., & Mingolla, E. (1985) Neural Dynamics of Form Perception: Boundary Completion, Illusory Figures, and Neon Color Spreading. Psychological Review, 92, 173-211 .
[7] Grossberg, Stephen, & Todorovic, Dejan (1988). Neural Dynamics of 1-D and 2-D Brightness Perception: A Unified Model of Classical and Recent Phenomena. Perception and Psychophysics, 43, 241-277.
[8] Hubel, D. H. & Wiesel, T. N. (1968) Receptive Fields of Single Neurones in the Cat's Striate Cortex. J. Physiol. 195, 215-243 .
[9] Kandell, Eric R., & Siegelbaum, Steven (1985) Principles Underlying Electrical and Chemical Synaptic Transmission. In Kandell, Eric R., & Schwartz James H. (Eds.) Principles of Neural Science, , New York, Elsevier Science Publishing Co.
[10] Kanizsa, Gaetano (1987). Quasi-Perceptual Margins in Homogeneously Stimulated Fields. In Petry S. & Meyer, G. E. (eds) The Perception of Illusory Contours, New York, Springer Verlag, (pp. 40-49).
[11] Kellman, P. J. & Shipley, T. F. (1991) A Theory of Visual Interpolation in Object Perception. Cognitive Psychology, 23, 141-221 .
[12] Lehar S. (1994) Directed Diffusion and Orientational Harmonics: Neural Network Models of Long-Range Boundary Completion through Short-Range Interactions. Ph.D. Thesis, Boston University .
[13] Walters, D. K. W. (1986) A Computer Vision Model Based on Psychophysical Experiments in H. C. Nusbaum (Ed.) Pattern Recognition by Humans and Machines, New York, Academic Press.
[14] Wilson, Hugh R. & Richards, Whitman A. (1989) Mechanism of Contour Curvature Discrimination. J. Opt. Soc. Am. A/ 6, 1.
[15] Zucker, Steven W., Dobbins, Allan, & Iverson, Lee (1989). Two Stages Of Curve Detection Suggest Two Styles Of Visual Computation. neural computation, 1, 68-81
[16] Zucker, S. W., Davis, S. (1988) Points And Endpoints: A Size Spacing Constraint for Dot Grouping. Perception, 17, 229-247 .
Note for HTML version: If your browser does not load the "Symbol" font, there will be problems with some of the following equations. Pi appears as p, theta appears as q, alpha appears as a, etc. If you see proper greek letters here, this problem does not apply to you.
A: Architecture of the Orientational Harmonic model with labels that relate to the equations defined in the appendix. B: Schematic depiction of the feature representation at the oriented filter response, the oriented cell layer, and the cooperative cell layers.
The general architecture for the Orientational Harmonic model is depicted in Figure 7 (A). An input image is projected onto the image layer which consists of a two dimensional matrix of cells Ixy. Cells Oxyq in the oriented layer receive activation from the input layer by way of oriented receptive fields centered at location (x,y), and with orientation q. This is accomplished by way of spatial convolution of the input image with the oriented filters, such that
(EQ 1) |
where Fijp is the oriented edge detector in the form of a Gabor filter defined by
(EQ 2) |
where f is the spatial frequency, k is a scale factor that determines the scale of the Gaussian envelope function, u = k sin(q) and v = k cos(q) where q is the oriented direction of the filter defined by q = p(2p/d), and d is the angular difference between adjacent orientations. In the computer simulations d=12 was used, i.e. 12 orientations at 30 degree intervals. The Gabor filter defined by Equation 2 generates a response that is sensitive to the direction of contrast of the edge, as shown schematically in Figure 7 (B), so that an edge at orientation p that would produce a positive response in filter Fijp would also produce a negative response in filter Fij(p+p). The absolute value function in Equation 1 produces a positive response from both of these filters, thus creating a response that is insensitive to direction of contrast. In fact, the oriented cell responses to filters Fijp and Fij(p+p) would be exactly equal, so that the oriented filtering of Equation 1 need only be performed through a range of orientations q = {0 to p}, where orientation q = p mod (p).
A ring of N cooperative cells Cr receive input from N oriented cells Oq from the oriented layer, as well as N inputs from the cooperative receptive field responses Lr. The orientation r in the cooperative layer makes a distinction between between edges of orientation q that are to one side or another of the center of the ring even if they are otherwise parallel, so that for example in Figure 7, the horizontal edge to the right and the horizontal edge to the left of the center of the ring produce distinct responses in the cooperative representation. The orientations r therefore range from r = {0 to 2p}, while orientations Q range from q = {0 to p}, as shown schematically in Figure 7 (B), so that the oriented response at orientation q provided input to two cooperative cells of orientations r and r+p. In other words, orientation q = r mod (p).
The cooperative receptive field responses Lr receive input from regions of the cooperative layer by way of monopolar receptive fields. These receptive fields Lr are defined by
(EQ 3) |
which produces a filter of the form sketched in Figure 2 (E). The final pattern of activation of the cooperative cell is also influenced by harmonic oscillations within the cooperative ring, which can be calculated as another input Hr to the cooperative cell. The activation of the cooperative cell is governed by the differential equation
(EQ 4) |
The decay term ensures that activation decays to zero in the absence of any input. The shunting terms ensure that activation remains bounded between zero and the saturation function value B, by gating the excitatory and inhibitory terms to zero as they approach these upper and lower bounds respectively. The function [ ]+ is a positive half-wave rectification, i.e. only positive values are preserved and negative values are mapped to zero, while the function [ ]- is a negative half-wave rectification, which preserves only negative values. The inputs to this equation are Or, Lr, and Hr. Or is the oriented signal, and Lr is the cooperative activation in an adjacent neighborhood as sampled by the cooperative receptive field Lr by the equation
(EQ 5) |
where Cxyr is the cooperative cell at location x,y, representing orientation r. The harmonic oscillations within the ring of cooperative cells perform a Fourier filtering of the cooperative signal within the ring by the filtering function with peaks at the harmonic frequencies. In the simulations, this function was approximated by a finite comb function with uniform values at the harmonic frequencies, and zeros elsewhere. A Fourier filtering with this simplified function can be equivalently calculated by convolution with a series of sinusoids Fj with orientational frequency j defined by
(EQ 6) |
where j=1,...,M and i=0,...,N. Here, filter Fj of harmonic j of a total of M harmonics is calculated for each of the N cooperative cell values. A zeroeth component filter is also constructed in order to measure the DC component of the orientational signal. The filter for the zeroeth harmonic is defined by
(EQ 7) |
where c is a positive constant. These filter profiles look like the waveforms sketched in Figure 5 (C). The filters are convolved with the N cooperative cell values of the N orientations, thereby producing a set of response values r given by
(EQ 8) |
for each harmonic coefficient j and for each oriented location i. A total response R is computed for each harmonic by summing over the individual responses at each orientation using the formula
(EQ 9) |
The magnitude of this coefficient for each harmonic j represents the magnitude of the response of the system to this harmonic. For example, a large value of R2 would indicate the presence of a strong second harmonic component in the pattern of orientations in the cooperative cells.
So far, what has been described is a feedforward process to detect the presence of various orientational harmonics in the cooperative signal, much like a Fourier analysis of that signal. The harmonic resonances in the ring of cells also influence the resultant pattern in those cells by constructive and destructive interference between competing harmonics in the representation, like a Fourier filtering of that signal. This was calculated by summing the total harmonic contribution Hi of all the various harmonic responses for each cell in the ring of cells, using the formula
(EQ 10) |
This harmonic response, together with the oriented input signal Oi and the pattern of activation in adjacent regions of the cooperative layer Li all contribute to activation of the cooperative cell Ci, as described in Equation 4. The fact that the input to the cooperative receptive field Li of cell Ci does not receive input from some previous layer, but from within the same cooperative layer, represents a recurrent feedback loop within the cooperative layer. On a smaller scale, the activation of the cooperative cell Ci is also influenced by the harmonic interactions between other cooperative cells within the same orientational ring, corresponding to a smaller, more immediate feedback loop within that structure, although calculated in this case at equilibrium. In the computer simulations, equation 1 was computed once only to generate the oriented input, then equations 4, 5, and 10 were computed iteratively using a Runge-Kutta algorithm until equilibrium was attained. The output images in Figure 6 represent the total cooperative activation Cr summed over all orientations r.