Review of Cognitive Psychology, MS#99-068R, Computational implications of gestalt theory: the role of feedback in visual processing, by Steven Lehar.
This paper introduces a computational model of human visual processing that instantiates some of the cooperative or 'field-like' properties of the classic Gestalt theory of vision.
In contrast with alternative neural network models (primarily the BCS/FCS model of Grossberg and colleagues) that seek to account for many of the same phenomena (e.g., perception of the Kanizsa triangle, modal and amodal completion, Craik-O'Brien-Cornsweet illusion, simultaneous contrast, etc) in terms of presumed underlying physiological mechanisms, the model presented here explicitly avoids neural interpretation.
A key computational feature of the current model is the use of feedback to 'reify' the abstractions computed at higher levels. For example, the configuration of recognized edges in the Kanizsa triangle percept is extracted by a series of hierarchical filtering processes and the output of the higher levels are then fed back to lower levels to help generate an emergent percept of visible surfaces.
This model clearly borrows many of properties of the BCS/FCS model Grossberg and colleagues. It differs in terms of some of the specific mechanisms that it proposes in order to account for many of the same phenomena, but it is generally quite similar to the BCS/FCS model both in spirit and in terms of presumed computational stages. The author's most significant point of departure from Grossberg's approach is his disavowal of the research program of attempting to identify processing stages in the model with actual neural processes in the brain.
In my mind, stripping away the identification of computational stages in the model with neural mechanisms undermines the justification for the types of filtering and feedback processes that comprise the current model. Why should the reader believe that Gestalt field effects are realized in terms of the computational processes suggested here, rather than by, say, principles of electrostatic self-organization originally suggested by Kohler, if the justification is not that this is the way that neural circuits really work?
If justification is not to be given in terms of neural plausibility, I think that it behooves the author to provide some alternative justification. The alternative justification might be given in terms of computational optimality or clever psychophysical probing of information processing mechanisms in vision (i.e, new perceptual data). The perceptual data provided in the manuscript is not sufficient to decide between this model and other published models. Without some sort of justification for belief in this particular visual processing model, publication in Cognitive Psychology would be inappropriate.
On the positive side, the model introduced here suggests some novel computational strategies for performing some of the same functions that are performed by the BCS/FCS model. In particular, the current model utilizes feedback in different ways than it is utilized in the BCS/FCS model. If an argument were given for the superiority of the current model in accounting for perceptual data, then a paper describing the model and its novel properties would make a significant contribution to the cognitive psychology literature. But that would be a different paper.
If existing neural or psychological data is insufficient to distinguish between the plausibility of the model proposed here and its rivals, a paper describing the novel features of this model might nevertheless be a significant contribution to the modeling literature. But, without new psychological data or neural justification, publication in Cognitive Psychology would remain inappropriate.
1) Abstract: The comment about 'a more parallel processing strategy' seems misleading, since the distinction that the author wishes to emphasize is between feed-forward and feed-back processing, rather than between serial vs parallel.
2) p.3: 'As soon as the global triangular for is recognized, the low level visual edges...appear at higher resolution' (True?). 'modal illusory edges and surfaces ...are virtually indistinguishable from actual edges and surfaces' (This isn't true).
3) p.4, top: '.....at the level of the retina, the rods and cones respond to absolute light intensity.' The author means to distinguish the photoreceptor stage of processing from later stages at which spatial differentiation takes place, but photoreceptors do more than just register absolute light intensity. What about adaptation, Weber's law, etc.?
4) p.4, top: Description of complex cell as connecting pairs of simple cells that respond to opposite contrast polarities is overly simplistic. Simple cells actually respond to the full spectrum of spatial phases.
5) p.7, top: The 'unless we invoke mystical processes beyond the bounds of science...' comment does a disservice to Dennett and O'Regan, who presumably believe that filling-in occurs on the basis of default assumptions instantiated somewhere in the brain's neural network.
6) p.7, bottom: Supply reference(s) for the statement that the Kanizsa triangle 'is filled-in with a white that is brighter than the white background of the figure.' Is this always true?
7) p.8, middle: I don't believe that Grossberg and colleagues would agree with a characterization of his model that says that the FCS is identified with modal percepts and the BCS with amodal percepts. In more recent versions of Grossberg's theory (FACADE theory), the BCS supplies information that allows higher level representations to be accessed, so that both modal and amodal figures can be recognized. The FCS is associated with the visible perception of surfaces.