Reviewer 1

Ms. 2533

Review of Gestalt Isomorphism I and II, revised version

As I have noted in previous reviews of this paper, it contains many interesting and innovative ideas that deserve to be presented to the perception community. It shows an advance with respect to the earlier versions. I still disagree with the author's view with respect to the explanation of illusions as misapplied constancies, but this is a legitimate difference of opinions. However, in my judgement the paper still shows a number of weaknesses that I will document below. My main complaints are, first, that in a number of cases the authors accounts may apply for certain select cases but may not apply for other, equally legitimate ones, which they fail to discuss, second, that some aspects of the working of the model are unclear, and third, that some modeling solutions lack psychological credibility.

Author's Response

Paper I, Section 2.4

The authors criticize the Grossberg & collaborators BCS/FCS model as not being able to differentiate between a dark gray square on a black surround from a white square on a light gray surround, because the targets have the same reflectance ratio with respect to their surrounds. Such stimuli are indeed problematic for pure local ratio theories, but local ratio sensitivity is only one aspect of the BCS/FCS model, and the model does include the influence of more remote ratios as well, as shown, for example, by simulations of one version of assimilation in the 1988 paper. Furthermore, the more recent version of this model, by Pessoa, Mingolla & Neumann, includes sensitivity to absolute luminance levels, which would easily enable it to deal with the above stimulus. Finally, the authors discuss this issue in the context of the importance of the illumination pattern for lightness perception, but in this particular situation the illumination does not vary across stimulus regions.

Author's Response

A critical issue in the perception of illuminated scenes is how the visual system distinguishes illumination edges from reflectance edges. The authors attempt to solve this 'edge classification' problem by proposing that borders of larger regions tend to be classified as illumination edges, and borders of smaller regions as reflectance edges. This idea may work in their example image, but it has no generality, because a small shadow can be correctly recognized when it falls across the reflectance edge between two larger regions with different reflectances. There is no general rule that shadows are larger than neighboring objects: big objects can cast shadows at small objects, but small objects can also cast shadows at big objects. Also, the authors still fail to note that their scheme is a neural model of the psychophysical 'albedo hypothesis', noted already by Koffka (1935).

Author's Response

Paper II, section 3.3

The explanation of the 3-D appearance elicited by 2-D images involves a mutually orthogonal completion in the 3-D representation. This may work for the class of objects whose surfaces are in fact mutually orthogonal, but it is not clear to me how this scheme would generalize to objects whose surfaces are not mutually orthogonal (for example crystals). Would it involve, in addition to the proposed 'orthogonality fields', additional fields involving other angles?

Author's Response

Paper II, section 3.4

The authors account for the percept of a rectangle induced by a trapezoidal stimulus by a perspective distortion of the volumetric representation. However, they point out that, within this scheme, such a stimulus is still compatible with an infinity of other perceptual interpretations. They appear to offer a solution to this problem by noting that the other (non-rectangular) interpretations are geometrically irregular, with unequal sides and odd angles. Such an account may explain the perception of some regular figures, but what about the correct perception of figures which are in fact irregular and do have unequal sides and odd angles (as most objects do)? To repeat my earlier question, how do we perceive a trapezoid (which projects as a trapezoid) as being a trapezoid, whatever its inclination? All the authors now say in the text is that 'other possible interpretations might be suggested by different contexts', but this is not a real answer unless it is explained which are these contexts and in what manner they affect the percept.

Author's Response

Paper II, section 4

The authors note that in size perception experiments with two objects of equal physical size but at different distances from the observer, depending on whether the instructions are objective or projective, subjects can both judge that the distant object is equal in physical size to the near one, but also that it subtends a smaller visual angle. They explain this latter perceptual capacity by assuming that the further object is 'experienced at a lower perceptual resolution'. This notion of perceptual resolution is not clear to me. Does it mean that the further object is seen as more blurred? But, when the further object is fixated, it is the nearer one that may appear more blurred, without being perceived as smaller. Thus this notion apparently does not refer to experienced visual resolution. It would be important to clarify the status of this notion and how its role could be empirically tested against other possibilities, such as that it is simply the angular retinal size of the stimuli that may be to a certain extent accessible to subjects. The authors talk about the decrrease of depth resolution as a function of depth, but how could the system step outside itself and come to know that it has this property? I note that the authors still claim that their model explains why shrinking objects are perceived as receding, but that they have not answered my query as to how in their model a shrinking object (such as a balloon) can be correctly perceived as shrinking.

Author's Response

Paper II, section 5.

Referring to figure 10b the authors claim that 'it is easy to point in the approximate direction of the perceived illumination source'. The position of the arrow on the left surface in figure 10c suggests that this direction is normal to the surface. Later they point out that 'the illumination could actually be coming from a range of angles near the normal' and propose a scheme in which the highest probability is assigned to the normal direction and increasingly smaller probabilities to angles increasingly deviating rrom the normal. But, to account for the shadedness of the two sides in figure 10c, the light source might in fact be positioned very far away from the normal direction, that is, anywhere in the large region in front of the left surface, except that it could not be so far to the right that it would illuminate the right surface more than the left one (assuming both surfaces have the same reflectance; without this assumption, all bets are off). Thus there is no reason to single out the normal direction from all other directions. Similarly, in the initial processing step (figure 12a) the assigned most probable location for the sun is right above the cube. But, the only time that the sun is in fact right above a horizontal surface is on some days at noon in the equatorial region of the earth. In all other cases the light rays fall more or less obliquely upon horizontal objects; this also applies for surfaces of almost all orientations, except when they happen to be oriented directly towards the sun. So why would the system incorporate the assumption that the most probable direction of the light source is normal to a surface when it is ecologically invalid and almost always wrong?

Author's Response

Another aspect of the proposed scheme is to assume a 'dark illuminant' (emitting, I suppose, some 'black light'?) upon darker object surfaces. While this idea has its charm, and could be used as a technical trick in a computational vision algorithm, such an assumption is very problematic for a model that is supposed to account for perceptual data in a more veridical manner than competing models, by following closely the phenomenology of perceptual reports. The problem is not so much that the notion of a 'dark illuminant' is physically unrealistic but that it is completely foreigh to everyday perception. There just are no dark illuminants nor are they perceived. Any model that seriously entertains the psychological reality of dark illuminants (or even two of them, as in figure 12b) is suspect as a model of perception.

Author's Response

Paper II, section 5.2

The notion that I find most troublesome in this work is the new section containing the reverse ray-tracing algorithm, as far as I understand it. It is obviously of great importance for the authors, because they say that it is this aspect of their approach that offers a solution to 'many troublesome issues in visual perception' such as transparency, specularity, mutual illumination, various types of shadows etc. The main problem that I see with this idea is that it appears to be a computational algorithm with no shred of evidence for its psychophysical reality, and therefore, in my judgement, would be poorly received by perception researchers. The very idea that the perceptual system incorporates a 'model of physical propagation of light through space' sounds very far fetched and does not have any phenomenological support. In everyday perception we just don't see any light rays, we see objects in space. Of course we now know that at any point of a surface light comes from many directions and is partly absorbed and partly reflected in many directions, and that through any point in empty space one can envisage a multitude of light rays cris-crossing through all directions. But these notions have slowly evolved during the history of science, and none of these rays as such are present in our direct experience of the world. So how could a perceptual system have come up with these ideas in the first place, not to mention implementing them in every portion of the represented space in a detailed, physically accurate manner, as explicitly proposed in the model? In my judgement this is just another instance of the brain being modeled after the latest technological invention, in this case ray-tracing algorithms from computer vision. Once one assumes such a system, it is of course 'easy' and 'natural', as claimed by the authors, to add features such as transparency, mutual illumination, or highlights and thus to take care of these troublesome issues. However, such an approach appears to me to substitute the perceptual account of a phenomenon with a physical-engineering model of its generation. One wonders, for example, why the authors have gone through the trouble, in paper I, to solve the problem of modeling the perceptual distinction between reflectance and illumination edges (criticized above), when they now say that the algorithm 'would automatically handle' every conceivable kind of shadow. The authors even propose to include into their model an account of refraction of light through transparent surfaces, which, if I understand correctly, would explain why objects viewed in part through such a surface appear dislocated. But refraction is a strictly physical affair which explains how the stimulus is generated, that is, how the light rays coming from the object are in fact dislocated through refraction, which is a fact that ray tracing algorithms have to take into account and that perceptual theorists should be acquainted with, but is out of place in a model of perceptual mechanisms.

One problem with such a model is that, being so knowledgable about the physical situation, it is likely to surpass the performance of normal perceivers, and thus may not serve as a good predictor of their performance. For example, the authors claim that a 'landscape of rolling green hills that become more blue with perceived distance from the observer would tend to be interpreted perceptually as being of uniform green, with the blue tint being attributed to the filtering effect of the semi-transparent atmosphere'. To me, and I think to other people as well, the distant parts do appear to be bluish themselves, and although I know, cognitively, that the effect is due to the atmosphere, perceptually I observe that the color is not uniform and that the distant objects themselves are of a different, more bluish surface color than the more nearby ones. Another problem is that real observers appear not to be disturbed or even notice when through computer interventions the shadows of objects in a photograph are manipulated such that they are geometrically contradictory (Pawan Sinha has some delightful demonstrations), whereas for such a model this should be immediately evident. In sum, it is not clear to me how a model which would incorporate such an extensive, detailed, and essentially correct knowledge of the external physical situation would ever fall prey to a perceptual illusion.

An additional problem for me is to understand how the model starts working at all. A computer vision ray-tracing algorithm starts from the knowledge of the geometry and photometry of a scene and then calculates the properties of the corresponding image. A visual system, on the other hand, must start from the image to recover the properties of the scene. The description of the model in section 5.2 starts with the identification of the direction of the light source (which I have criticized above). Note that at this moment, as far as I understand the model, only the direction but not the intensity of the source is given. Light, as represented in the model, then proceeds from the source in all directions, exactly as in the physical world. The authors then say that 'whenever the light signal encounters a perceived surface (i.e. elements in the opaque state), the elements representing that surface take on a surface illuminance value which is proportional to the total illumination striking that surface from any direction'. But, it is not clear to me where that value is supposed to come from. How does the system, at this point, know about these intensities? They go on to say that 'a second variable of surface reflectance is also represented by every element in the opaque state'. But the perceived reflectance is exactly the main problem for lightness algorithms to recover. Finally, they say that the percept of brightness is the product of perceived illumination and perceived surface reflectance. This sequence of events mimics the physical sequence in which the (assumed known) values of illumination and reflectance are multiplied to get the value of luminance, but the perceptual system starts by 'knowing' luminances or brightnesses, and in that case illumination and reflectance are two unknowns in the equation.

Author's Response

A general comment is that although the authors now clearly point out that their model is more of a framework rather than a fully specified and simulated model, they still include what I feel are somewhat unfair comparisons with respect to other modeling work. For example, in paper I, section 2.2, they criticize models that strive for neural plausibility for 'unnecessary complexity', because in their view such models, for example, include separate modeling of different classes of cells, and praise their own approach as being free from such constraints and thus enabling a more direct modeling of the percept. However, it is also the case that a model that strives for neural plausibility has, exactly in virtue of that property, an advantage over the model that does not strive for it.

Also, in paper II, section 3.4, they claim that 'perspective cues offer another example of a computation that is inordinately complicated in most models', without citing any of those models or discussing the nature of those inordinate complications. In fact, although they do specify some of the mathematical properties of the units involved in their model, they do not present, as other modeling work does, any of the actual computations that take place when the model encounters a stimulus, nor any simulations of the outcome of these computations, so that the reader cannot compare the computations within this model with those involved in other models, and thus cannot judge which are the more 'inordinate' ones.

Finally, in connection with their reverse ray-tracing model, the authors do point out, in paper II, section 6, the difficulty of the computational tasks faced by existing computer vision algorithms of this type. Their solution is 'a parallel ray-tracing algorithm that follows all light paths simultaneously' which would involve 'a multitude of relatively simple computations'. Quite apart from the fact that this is just presented as an informal idea and that actual parallel programming is far from simple, it is also true that most neural models which are composed of many interacting units involve relatively simple local computations, so that the authors model has no special claim on simplicity inthis respect.

Finally, I still think that the authors are not careful enough to distinguish their own vision of gestalt theory from the actual gestalt literature itself. For example, in paper I, section 1.3, they explicitly claim that 'Köhler argues that such a model cannot...', where the simple fact is that Köhler does not argue that, if only because when he wrote in 1947, the on and off cells, involved in the mentioned model, were not even discovered yet. The argument may be in the spirit of Köhler, but that is a different thing than attributing it to Köhler himself. Also, I agree with the other reviewer that Köhler's notion of isomorphism is different from the authors. In their reply to the other reviewer the authors claim that the principle of isomorphism implies modeling the information encoded and not the mechanism itself. This is an interesting idea, but I do not think that what Köhler talks about is quite the same as the encoding of information as different from the mechanism.

Finally, in their closing statement the authors talk about the 'illusion of consciousness' as being the key inspiration of the Gestalt movement, 'from which all of their other ideas were developed'. If they really believe this to be a historical fact, they should at least make a list of all the gestaltists 'other ideas' and document for each of them exactly how it was developed from the 'illusion of consciousness', otherwise this is just a piece of rhetorics that might sound like a nice way to conclude the paper. Furthermore, I do not see in Köhler's intricate discussions the basis for any such blunt statement as the author's claim that 'the world we perceive around us is an illusion'.

Author's Response