The larger message of this paper is that our science as a whole took a wrong turn back in the 1960's following the discovery of the single-cell recording technique, that spawned the emergence of local point-like models to match the point-like nature of this new recording technique. This led to a wholesale abandonment of the field-like approach advocated by Gestalt theory. The digital computer represents the ultimate in local point-like calculation, where even spatial operations such as image convolution and the Fourier transform are computed sequentially, one point at a time, and the outcome of each point-like calculation is rigidly confined to a few well defined variables. In the digital computer there is no field-like diffusion of information, there is no analog relaxation as in a soap bubble, and there is no reification, just abstraction, pure and sublime. The vast success of this device inspired the "AI" movement, whose proponents suggested that all problems of perception and cognition could be reduced to such symbolic terms. A whole generation of perceptual scientists has been raised in this intellectual paradigm, convinced that the phenomena of perception can be reduced to precisely determined local calculations. Only now are the limitations of this approach coming to light, although there is currently no concensus on exactly what the alternatives might be.
I do not propose to solve the problem of vision in one fell swoop. Indeed, what I propose is more like a giant step backwards from the heady optimism that the problem of vision is virtually solved, save for a few details. For the message of the Gestalt movement today as before, is that the vision problem is vastly more complicated and involved than the reductionists, past and present, would have us believe. My proposal is advanced not so much as a solution to the problem of vision, but rather as a sobering reassessment of the magnitude of the problem at hand. What I offer is not so much a solution, as it is a critique of the assumptions underlying the simplistic approach that has dominated our field since the demise of the original Gestalt movement, and a general proposal indicating the direction in which the correction should be made.
The general message of this reviewer's final comment is that I should do like the others in my field, which is to break off a managable piece of the problem and reduce it to unambiguous computational terms. In so saying, he reveals that he has missed the whole point of the paper. I would love nothing better than to perform simulations to test these ideas. Unfortunately these ideas remain only vaguely formulated, as the reviewer is aware. Furthermore, even when they are eventually formulated more precisely, the nature of the proposed mechanism is such that on a serial computer these simulations will always suffer problems of spatial and temporal resolution, and a simulation of the full system would be impossibly slow. Unlike more abstracted models, a model of the nature herein proposed cannot validly be broken into component modules that are simulated independently, for the system is designed to operate as a unified whole, i.e. the performance of each module would be significantly altered by separation from the rest of the system, just as a dynamic simulation of soap bubble formation cannot produce the proper spherical result unless the entire bubble is modeled simultaneously; or a dynamic simulation of the Earth's atmosphere cannot produce accurate results unless the entire spherical earth is modeled. If the Gestalt concept of biological computation is currently impossible to simulate with any precision, I maintain that it is better to offer vague and poorly specified descriptions of the system as best we can describe it, than to offer reductionist but fully specified models of parts of a system that bear no resemblance to the dynamics of biological computation.
While I do not doubt the excellence of the thinkers who have grappled with these problems for so long, I reserve my greatest respect for the Gestalt masters for their deep insight into the essential nature of the perceptual mechanism. If the reviewer considers conventional models to be generally no great departure from Gestalt theory, I respectfully submit that the reviewer is mistaken on this point, although he has plenty of company in so thinking. Sometimes even excellent thinkers can get caught in an ebbing paradigmatic tide.
It is true that intuitions should be tested. But the first step is to define and specify those intuitions in preparation for testing. This is exactly what Marr and Biederman have done with their intuition of the feed-forward hierarchical view of vision, which also remains to be tested. In the meantime, alternative intuitions should be equally exposed to the community exactly in order to have them be tested, not necessarily by the original author, and not necessarily as a precondition to publication. If we insist on testing all hypotheses prior to publication, then only the simplest hypotheses would ever be be published, which is exactly what has been occuring in our field for too long.
Finally, it is, let's say, unproductive to claim of one's own model that it `represents so great a departure from the conventional approach to modeling perception', when there are others, starting with Grossberg and asso ciates, but also other schools, who have done similar work, but both mathematically specified and simulated.
I hope I have sufficiently communicated my highest regard for Grossberg's contribution to this field, in the use of dynamic systems models to emulate the dynamic Gestalt laws, the use of visual illusions to test those models, and the inspired use of boundary completion and surface filling-in to replicate observed perceptual phenomena. Indeed I consider my own work to be a direct extension of Grossberg's work, advancing from his ideas in the same conceptual direction. However I believe that Grossberg is premature to advance his models as neurophysiological ones, which is why I propose perceptual modeling to allow for yet undiscovered neurophysiological principles and mechanisms, such as those suggested in my thesis work. Also, Grossberg's failure to recognize his boundary completion and surface filling-in as manifestations of a more general principle of reification, together with his emphasis on neural mechanisms, led him to propose a model of depth perception that is less than fully reified, and is thereby not isomorphic with the subjective experience of spatial perception. The greatest departure of the Gestalt Bubble model from Grossberg's view of perception however is in its literal and explicit interpretation of the principle of isomorphism. This is also the most contraversial aspect of this model, especially the full spatial reification and the reification of the perceived propagation of light, and the reification of amodal perception of hidden surfaces, and of the space behind the head. It seems to me that this model must either be wrong, or is must represent a significant advance over the current concensus in vision modeling. It simply cannot be the case that this model is both right, but of only minor significance to vision modeling, and I am amazed at this reviewer's suggestion that such is the case.