Reviewer 1
Review (Ref. No. 2533 LS/PB) of Lehar's paper entitled 'A gestalt bubble model of the interaction of lightness, brightness, and form perception'
Note on [Author's Responses] in this document.
Lehar's paper is an ambitious undertaking in modelling a wide variety of perceptual phenomena. Rather than providing the computational details of the model, the author delineates a general approach to perceptual modelling, stressing lower level mechanisms rather than cognitive strategies.
He begins with some general remarks concerning abstraction v.s. reification, isomorphism, perceptual v.s. neural modelling, and issues of consciousness.
He presents a short description of Grossberg's BCS/FCS model and proceeds by providing modeling solutions to a number of perceptual phenomena. They involve brightness contrast and assimilation, perception of illumination, and perception of 3-D form for which he proposes what he calls a Gestalt bubble model. He then describes a representation of 3-D space involving an azimuth / elevation / vergence co-ordinate system. He uses this representation to explain the interaction of representations of form and brightness. He ends with discussions of transparency, neon-color spreading, shape from shading, and shadows.
There are a number of aspects of this paper that I like. They include the stress on lower-level processes, the principle of isomorphism and the importance of explaining appearance, the need for a full spatial representation of 3-D forms, and the interaction of local and global processes. Also, the paper contains a number of interesting observations and creative insights. However I cannot recommend this paper for publication in its present form. The reason is that I find a number of serious problems with Lehar's modelling solutions, listed below in 6 points.
As minor issues, I note that, while advancing the need for a full 3-D spatial representation in section 1.4, the author criticizes Marr for the 2-D-ness of his 2 1/2-D sketch, but forgets that Marr also talks about a 3-D representation; [Author's Response] also, he claims that Koenderink proposes a nominal representation in a 2-D map, whereas in fact Koenderink explicitly deals with Gaussian curvature which is a quantitative indicator of curved surfaces in 3-D space, and the nominal classification only reflects its sign. [Author's Response] In addition, although the authors approach is claimed to be based on the work of the gestalt psychologists, he does not seem to have consulted their original writings too thoroughly. For example, he never cites Koffka's Principles of Gestalt Psychology, where the analogy with the bubble is elaborated. [Author's Response] On the other hand the authority of Koehler is invoked in section 1.3 to claim that 'in an isomorphic model there must be some cell or variable or quantity at the center of the square which is "on" when a whiteness is perceived at that location and "off" when no whiteness is perceived there'.
Such a claim, and others in that section, go much beyond Koehler's cited 1947 book. [Author's Response] Furthermore, one wonders how, say, a light gray would be represented (light gray cell "on", and white, medium gray, dark gray, and black cells "off"? Would each gray level have a cell of its own? If not, then why would the white have one?). [Author's Response] Finally the discussion of principles of abstraction, compression, and invariance in section 1.1 would need a conceptual clarification: as each principle apparently involves the other two, and they are all said to be 'intimately connected', it is not clear what the differences between them are, and whether a single principle would suffice. [Author's Response]
1.
In section 2.2 the author tries to improve the BCS/FCS model by including an account of both brightness contrast and assimilation, by proposing that 'brightness contrast occurs between figure and ground ... whereas brightness assimilation occurs within the figure or within the ground...'. He claims that in the examples in his Figure 7 the fragments 'appear as multiple components of a single larger form, rather than individual figures against a common ground'. There are three problems here. First, one could disagree with this particular phenomenological figure-ground description, especially for displays A and B involving gratings. [Author's Response] Second, how would the model implement the parsing of a display into figure and ground? This is a far from trivial problem, and a model that involves this distinction must at least address that issue, even if it does not supply a detailed implementation. [Author's Response] Third, in Helson's (JOSA, 1963, 53, 179-184) classical research (which the author does not cite) it is shown that in grating displays both contrast and assimilation can occur, depending on the widths of the black, gray, and white stripes (in fact, in displays A and B, which are presented as examples of assimilation, I see contrast, not assimilation). It is not clear how a model that only involves figure-ground relation can accomodate for such data. [Author's Response]
2.
In section 2.3 the author proposes a solution to the problem of factoring the image into reflectance and illuminance components. It involves a mutual inhibitory competition between illuminance and lightness nodes, both fed by a brightness node which itself is affected by stimulus luminance. This idea is related to the so-called albedo hypothesis (not cited by the author), according to which perceived brightness is a product of perceived illumination and perceived lightness, in the same way as luminance is a product of illumination and reflectance. I see two problems with his proposal. First, in his model the lightness node and the illuminance node are completely symmetrical: both are excited by the brightness node and each inhibits the other. How would such a symmetrical arrangement lead to the generation of two structurally quite different outcomes, the illumination percept and the lightness percept (Figures 8A and 8B)? It might, but it might not, and without a simulation one just does not know. [Author's Response] Second, as Gilchrist has pointed out, and as the author notes himself, it is the nature of intersections in an image like figure 8A that is probably involved in the factorization of the image. However, in the proposed model this is not taken into account, and only the processing of luminance is discussed. [Author's Response] Incidentally, the author misrepresents the Gelb effect when he claims that the illumination source must be 'carefully concealed from the observer'. It does not have to be, and the effect does not disappear 'as soon as the observer is made aware of the intense illumination source', for example by telling observers about it or even showing them the source. However it does get diminished when a genuinely white object is introduced to the field of view.
3.
In sections 3.2-3.3 a 'gestalt bubble' model of perception of 3-D form is developed, which intends to be more general than Adelson's rod and rail model, described in section 3.1. As in the above discussion, I have a general criticism that without a simulation we just don't know whether the proposed idea works. [Author's Response] The author says that after some processing 'the entire system attains a globally stable configuration, the most stable state being one which resolves conflicting surface interpretations in the most natural, or lowest energy manner, by the Gestalt principle of praegnanz'. However, not much is specifically said about how such natural conflict resolution is attained, nor is the apparently crucial notion of energy even defined, nor do we learn in what way praegnanz might actually work. [Author's Response] An instructive example of the need of simulation is found in the discussion by Leclerc & Fischler (Int. J. of Comp. Vis., 1992, 2, 113-136) of a line drawings model by Marill. They show how that model accounts very well for some drawings, but fails miserably on others. This analysis was only possible because, in contrast to the author's model, Marill's model is mathematically specified and could be simulated. A second criticism is that the model only involves cases of coplanarity and orthogonality, and does not even mention perceived oblique angles between surfaces, or cases of curved surfaces, which are easily induced by line drawings. [Author's Response] Third, the concrete cases analyzed by the author involve single lines, two lines, a square, and the Ehrenstein illusion. No case of a real 3-D form, such as a corner or a cube is discussed in sections 3.2 and 3.3. [Author's Response] Fourth, the author fails to address a crucial problem in line drawing perception, and that is that for some reason some line drawings are perceived as 2-D forms and others as 3-D forms. Dozens of such examples are found in Butler (JEPHPP, 1982, 8, 674-692) and other papers cited there. [Author's Response]
Finally, the author writes that 'when two abrupt line endings are presented opposite each other, they tend to break away an occluding surface between them ... and furthermore, they tend to propagate behind the occluding surface resulting in an amodal completion of the collinear edges behind the occluding surface. This model therefore explains the phenomenon of amodal completion...'. In my judgement, what is given here practically amounts to a description of amodal completion, but not to an explanation, at least until we learn more about how such a propagation takes place. [Author's Response]
4.
In section 3.4 (due to a typo denoted as 3.2) the author deals with the depth dimension. The main problem I find here is the idea that 'the transformation from the infinite Cartesian grid ... to the angle / angle / vergence representation ... represents a perspective transformation of the Cartesian grid...'. I think that this idea, in so far as I understand it, is mathematically incorrect. There is a difference between a co-ordinate system transformation and a perspective projection. The author claims that his transformation converts straight lines (such as the horizontal line of the road or the vertical edges of houses) into curved lines. This is true when such lines are projected onto a spherical surface, but it is not true for the presented co-ordinate system transformation. In such a transformation the straight grid lines in the Cartesian system are not mapped into the curved lines of the spherical system. I will try to explain this point with a simpler case. Consider a 2-D x-y Cartesian system, with horizontal and vertical grid lines (x=const and y=const), and a 2-D r-phi polar system whose grid lines are concentric circles (r=const) and radial straight lines (phi=const). The point is that it is not the case that a change of co-ordinate system maps the Cartesian grid lines into the polar grid lines. For example a straight line in the Cartesian system (say with the equation y=C) remains that same straight line when a polar system is used, it just has a different mathematical representation (in this case r=C Sqrt[1+cotan^2[phi]]). Genuine co-ordinate system changes do not change the attributes of objects, only their co-ordinates. To return to the author's idea, what his transformation does is to compress the space in depth, due to vergence (which is not a co-ordinate change), but fronto-parallel horizontal and vertical lines, which are unaffected by depth compresssion, would remain straight and not curved, as in his figure 17. Now, there is a way to transform these straight lines into curved lines, and that is to project them on a spherical surface, like the retina. The problem, of course, is that one loses the depth dimension by doing so. There could be other ways, in which all three dimensions are retained, but the author's way is not one of them.
5.
Section 4 deals with the interaction of form and brightness. This section analyzes the percept involving a dihedral corner which can be seen as convex lit from the left, concave lit from the right, or flat. I see several problems here. First, the model involves the strange notion of a 'dark illuminant'. It is claimed that a convex corner seen as lit from left also involves being lit by a dark illuminant from the right. But phenomenologically, what one sees is that the right side has an attached shadow (due to a more oblique illumination angle and/or weaker ambient illumination), not that there is an additional, dark illuminant. [Author's Response] How would the model deal with more complicated surfaces, say a trihedral corner? Would this involve the assumption of a 'medium' illuminant? What about curved surfaces? [Author's Response] Second, the model involves the assumption that the direction of the illuminant is normal to the illuminated surface. However, many illuminated plane surfaces are illuminated obliquely; in curved surfaces only isolated points are illuminated normally.
In the example with the dihedral corner, when the left side is seen as illuminated, the illumination could come from any direction in front of that side of the cube, not just normally from the left, as assumed. [Author's Response] Third, the model is used to account for the effect in figure 21, in which the right side of a cube is seen as slightly lighter than its top, although they have the same luminance. This is explained as due to the notion that all the other cubes in the figure induce the percept of a dark illuminant from the right.
However, I continue to see the lightness effect even when I cover all the other cubes.
Finally, the account uses the co-ordinate transformation of perceptual space which was criticized above in point 4. Furthermore, within the co-ordinate transformation, the model involves an explicit location of the source of illumination in the perceptual space, whereas in reality we do not need to see the source in order to perceive illumination, in fact we actively avoid looking at sources, even peripherally.
6.
In section 5.2 the author claims that 'formal attempts to express the problem of shape from shading in rigorous mathematical terms have only served to highlight the multiple uncertainties inherent in the problem. The Gestalt bubble model on the other hand offers a simple gestalt solution to this problem'. This is misleading, to say the least. The author analyzes only one class of examples of shape-from-shading, does not cite any work on this problem, and appears to be unaware of its difficulty. He begins his explanation with the phrase "Since this figure is seen as a unified gestalt, ...', but does not offer any modeling account of when figures are seen as unified gestalts. He claims that his approach 'simplifies these calculations' without explicitly showing the calculations. He states that his model 'integrates cues from a variety of modalities', but does not present any mechanistic details about this notorously difficult issue. The example he gives is that 'the arrow piercing one of the features marks that feature as a convex bulbe rather than a concave dimple'. However, at least for me, this example of cue integration fails, as I can see that piece as concave or convex, just as all the others without the arrow.
In conclusion, although I see this as an interesting piece of work, in the current form the deficiencies still outweigh the positive aspects. The author tries to deal with a variety of classical percepual problems, issues on which there are a lot of empirical data and with which a great number of excellent thinkers have grappled for a long time. It is a bit unrealistic to hope that these issues can all be solved in one fell swoop. Intuitions of solutions are fine, but simulations that test them are necessary to convince others with conflicting intuitions. The author should address the concrete modeling issues noted above and show more awareness of the relevant literature and known phenomena and problems in the area he deals with. Finally, it is, let's say, unproductive to claim of one's own model that it 'represents so great a departure from the conventional approach to modeling perception', when there are others, starting with Grossberg and associates, but also other schools, who have done similar work, but both mathematically specified and simulated.