Reviewer A

Review of: "Computational Implications of Gestalt Theory: The Role of Feedback in Visual Processing" by Steven Lehar For: Cognitive Psychology (99-068R)

Recommendation: Reject

Overview:

The paper makes several claims. 1) There is an advantage to modeling on a "perceptual" level rather than a neural level. 2) Gestalt theory suggests a need for a multi-level reciprocal feedback (MLRF). 3) MLRF suggests a convolution and reverse convolution so that all levels of a system, which represents somewhat different information at different stages are in agreement. The paper demonstrates computer simulations of this approach as applied to some visual illusions.

This is a very well written paper. The "tutorial" on edge processing is excellent; and much better than what is usually written in textbooks on image processing. The simulations are very nice (although I have a few quibbles).

My problem with the paper is with regard to the core ideas. The key characteristics of the resulting system end up being very similar to Grossberg & Mingolla's (1985) neural network model. The author argues that his system is only an example of the more general approach. Maybe so, but then he would do better to indicate the power of his approach by finding a conclusion that has not already been reached by other means. Moreover, I find the G&M path to this solution more justified and satisfying than the MLRF path. I'll discuss this in detail below. The net result is that I don't feel the paper fully lives up to its goal of demonstrating the MLRF approach.

Author's Response

I have other concerns about the invocation of Gestalt theory and MLRF in general. I discuss these issues in the specific comments below.

Specific comments:

1) Gestalt theory: In the Abstract, Introduction and elsewhere, the author seems to treat Gestalt theory as something that needs to be explained. This, I think, is a mistake. Gestalt theory is a collection of ideas and (partial) explanations. It is not experimental data that needs to be explained by models (neural network or perceptual). The Gestalt theorists did gather some quite interesting experimental data and that data does need to be explained by theories. However, the following phrase (taken from page 3, middle of first paragraph) does not really make sense "Gestalt theory also provides phenomenological evidence suggestive of some kind of top-down feedback,..." This treatment of a theory as a type of evidence is inappropriate. The discussion of Gestalt theory needs to be revised to indicate that what the author is really trying to do is combine ideas from Gestalt theory with data from neurophysiology and psychophysics.

Author's Response

2) Reification: The paper proposes that top-down processing attempts a type of reifiction, or inversion of bottom-up (abstraction) processes, so that every level of the system is in synch. Page 5 states the focus of the paper as: "...how this reification might occur in general..." OK fine, but that still leaves unclear exactly how this is to be done. More generally, it leaves unclear what the types of abstraction should be. Later sections of the paper describe an example of explaining illusory contours, but the mechanisms (edge detection, boundary completion, filling-in) are seemingly pulled out of thin air (or from other models). Does the approach not give any justification for these types of processing? I guess it is OK if the model does not because it still makes the claim that every type of processing is also reified, which is a novel claim. Still, it limits the applicability of the model.

Author's Response

Early on (page 5, middle of bottom paragraph), the text that top-down processing "... attempts to reify every possible variation ... simultaneously ...". But later in the text (page 31, first full paragraph) it is noted that the apolar cooperative image does not produce any feed back to the image level. The latter is quite reasonable because the apolar cooperative image does not contain any information about what the colors should be at the image level. However, that there is not feed back indicates a limitation of the reification approach. It does not occur at every level. Now, it could be claimed that signals are feed back, but they cancel each other out at every location. That needs to be at least stated explicitly, if not directly modeled. And it needs to be explained why/how they cancel each other out.

Author's Response

4) Justification for processes: After reading the paper I am left wondering why the visual system would be built with the levels proposed by the MLRF approach. The current model seems to be built to account for the percepts of illusory contours. But the visual system surely did not evolve to allow it to see Kanisza triangles! What are the constraints that cause the visual system to produce modal and amodal percepts? Why is there grouping among edges?

The answers to such questions are not necessary at the beginning of an investigation, but they should always be in the background, at least. Moreover, the author already describes the answers in the context of Grossberg & Mingolla's theory. His own theory has no answers. It simply notes that there can be an amodal representation of contours that is distinct from the modal percepts that keep polar information. Why it should be that way is not addressed.

Author's Response

Some phrasing on page 33, middle of the page, has the situation reversed, I think. It is discussing Grossberg & Mingolla's model and says it's mechanisms were, "... presented as specific computational strategies employed to account for specific illusory phenomena. The present proposal is more general, for it suggests a general strategy of information processing ...". In my opinion, this is exactly backwards. Grossberg & Mingolla's model was an attempt to identify foundational computation properties of visual processing, which also happened to account for specific illusory phenomena. The illusory phenomenon are explained by the model because the model was not directly designed to account for them (although it was sometimes guided by them). In contrast, the current proposal by directly attempting to model the percepts of illusory phenomenon leaves us wondering whether it will work for non-illusory phenomenon. (I happen to believe it will, but primarily because the proposal is much like the G&M model.)

Author's Response

5) The (nearly) bottom line: So the issue boils down to whether the current recasting of the ideas in G&M in terms of inverse transformations is worth publication in an archival journal. I don't think so. The inverse transformation approach leaves a number of issues as loose ends (e.g., the existence of the apolar cooperative image).

I am not saying the inverse transformation approach is wrong, it may very well be right. However, it cannot, by itself guide the development of models of visual perception; it needs other computational constraints to guide it. It would be quite interesting if the inverse transformation approach ends up being correct. It could be a principle that helps guide the development of specific models. However, it needs some evidence to support it. I agree that many of the components of the G&M theory can be interpreted in the context of the inverse transformation approach; but one example is not enough to verify the approach, especially since the example was built with different principles in mind. At the moment this looks too much like post hoc theorizing.

Author's Response

6) As a minor point. The basic ideas here are quite reminiscent of many other models. The statement on page 8 (3rd line) that these concepts have not been incorporated in computational models is not true. Geman & Geman (1984) used simulated annealing to satisfy multiple constraints from different sources of information. Many of the connectionist networks have similar properties. More recently, a variety of relaxation algorithms are used to deal with some of the issues identified by the Gestalt psychologists. If the author somehow thinks these approach are all missing fundamental issues, then he needs to (at least briefly) discuss and dismiss them.

Author's Response