This is an unusual manuscript that does not fit most of the standard molds. Its aim is ambitious - to provide a theoretical explanation for the apparent visual sensitivity to the global organization of a wide range of 2D spatial patterns. As the title indicates, the paper attempts to offer a plausible computational model for the apparently rapid and efficient visual representation of many different 2-dimensional patterns, including illusory contours, many forms of symmetry, and a variety of other forms of Gestalt organization. This paper can be significantly improved in many ways, but I think it offers a potentially valuable contribution.
From my perspective, one of the salient shortcomings of this paper is that it is too long. Generally speaking, the manuscript is clear and readable, but it uses more words than necessary. Also, it repeats and belabors some points, and, in my opinion, wanders too far afield in discussing nonessential evidence with controversial interpretations. Many paragraphs are too long, diluting and camoflaging key ideas. Along with the need for being more concise, the paper should also provide more information on several basic points.
The "perceptual modeling" approach adopted in this paper is not very popular in contemporary vision research (although similar approaches are more popular in cognitive science). One reason that this approach in this paper may not attract immediate enthusiasm is that the explanatory principles are mainly physiological but the evidence is mainly from subjective phenomenology and from computer simulations. Many readers are likely to regard this as too speculative. [author's response] The computational modeling ideas are interesting (to me, at least), but they are not used to model quantitative psychophysical or physiological data. Some of these organizational phenomena are well known to be theoretically challenging - e.g., illusory figures - so perhaps they do not require quantitative psychophysical data to warrent study, but the lack of such data will weaken the paper in the eyes of many readers. [author's response]
The principal component of this paper is the computational model, but the author offers too little information about the details. I did not follow several important aspects of the computational processes in this model: (1) What, exactly, produces the several orders of harmonic response? (2) How is the total harmonic response computed? (3) How or why does one particular harmonic order dominate the total harmonic response? (4) What is the nature and origin of the competition among the several harmonic orders? (5) What are the temporal parameters of these standing waves? (6) Are these standing waves conceived as occurring in a single layer, or do they develop from the dynamics of interactions between the "input layer" and the "feedback layer"? (7) Precisely what produces suppression of alternative groupings? (8) If these standing wave patterns are produced by Gaussian diffusions from each point, then what governs the amplitude and spatial scale of these distributions? Does the spread increase with time - e.g., linearly, as in the heat equation? If I can try to read this paper carefully and still have such questions, then other readers will also have such questions. And with such questions unanswered, then of course the paper cannot be said to have clarified the phenomena it purports to explain. Additionally, I would like to apply this model to other results and phenomena in spatial vision, but this paper provides insufficient information to permit such applications and extensions of the model. [author's response]
Additional links with the literature would be desirable. Two lines of work that are not referenced but appear potentially relevant are: (1) Blum's grass-fire model for visual representations of spatial patterns, and the related theoretical and experimental work by Steve Pizer and Christine Burbeck on such models for detecting symmetry and representing the medial axis of 2D forms. (I do not locate references to this work, but I know there are a number of publications extending over several years. If the author cannot locate some of these references, then it would be worth contacting Pizer, who is in Computer Science at U. of North Carolina, Chapel Hill.) I think that the present model resembles the grass-fire model in some ways. (2) Christopher Tyler has an important line of work, some of it quite recent, on visual symmetry detection. I recently heard Chris give a fine presentation of work on a contrast energy model of symmetry perception, and I think that he may have a recent publication on this work.
One of the author's arguments is that conventional models of form perception that involve initial detection of edges or other such features are insufficient for detecting global organization of many types of forms - e.g., the camo triangle in Fig. 1a. References to computational evidence for this claim would be appropriate since this limitation is not acknowledged in the standard literature. I believe that the author is correct about this limitation of most template and neural network models, but many readers will not accept the author's unreferenced claim about this. I believe that support for this can be found in computer vision literature, partly because I can recall having seen a paper by a recognized authority on computer vision (whose name escapes me at the moment). [author's response]
Evidence from hallucinations and anthropology presented in the sections on "perceptual eigenfunctions" (p. 17) and the "psycho-aesthetic hypothesis" (p. 18) and Figs. 8 and 9 is not compelling or convincing and serves only to weaken the paper. In my opinion, this line of evidence is best omitted. Perhaps this evidence warrents a brief mention, but that should be restricted to one or two sentences. [author's response]
The final Discussion section beginning on p. 32 is too argumentative, and does not pull together the themes developed in the preceding sections. Its argumentative tone will lessen the paper's effectiveness for many readers. The last paragraph is unnecessarily defensive and weakens the author's persuasiveness. [author's response]
In general, this paper needs an extensive revision that trims a lot of the present material and provides more information about many details of the model. Despite these needs for significant revisions, I would like to see this paper published.