Rebuttal

The isomorphic reification of the image into three distinct perceptual components does not preclude an influence due to intersection types. Indeed, the multistability of the system offers an opportunity for a "vertex type detector" to contribute its own input to the final state of the system. For example Adelson categorizes vertices into non-reversing, single-reversing, and double reversing types, where the reversal refers to a reversal of contrast polarity along a particular edge.

Figure 2

For example the single-reversing edge in Figure 2 (B) is the vertical edge, because it reverses from a light/dark polarity above the vertex to a dark/light polarity below the vertex, whereas the horizontal edge in Figure 2 (B) is uniformly darker above and ligher below. A non-reversing edge is suggestive of transparency, or alternatively of an illuminance edge, whereas a reversing edge must involve some reflectance change, like at a painted edge. In Adelson's model these vertices label a surface as either transparent or opaque. This highlights the difference between the symbolic abstraction approach v.s. the isomorphic reification approach. In the symbolic approach the label defining the perceived surface property is attached or linked to the surface like a tag, which does not, of itself, change the pixel representation of that surface in any way. This is like saying that when the percept of a surface flips between transparent and opaque, for example, this occurs as the flipping of a single symbolic variable that refers to the surface in question. In an isomorphic model, since every point on the perceived surface changes fundamentally with the perceptual flip, this suggests that every pixel in the internal representation of that surface must also be changed as the percept flips, which would correspond to the flipping of images in the three-layer model. A vertex-type detector can therefore be easily added to this model in order to bias the result one way or the other. In other words, this model can be formulated to behave much like Adelson's symbolic abstraction model, with two significant differences: first, the system is fundamentally multi-stable, which means that local conditions that suggest a certain interpretation can always be overridden by a conflicting global configuration. In other words the dynamic system does not follow a rigid set of deterministic rules, but rather it remains flexible, and sensitive to external influences, for example from other modalities. No matter how complex a dynamic system becomes, it is always possible to attach extra dynamic forces, like inserting extra springs between existing components. The second significant difference is that the symbolic system is inconsistent with the subjective experience of perception, an objection that is not normally considered.

There is another deeper level at which the Gestalt Bubble model challenges the general approach pursued by Adelson. While Adelson categorizes the whole X-junction as non-reversing, single-reversing, or double- reversing, and labels adjacent surfaces accordingly, the reification approach essentially decomposes the X- junction into its component edges, one vertical and the other horizontal, and attempts to complete each edge through the vertex independent of the other. A non-reversing edge by its nature will complete more easily through the vertex in a contrast-sensitive representation, and the completed edge will in turn promote a corresponding surface percept either side of that edge. Even a reversing edge, like the vertical edge in Figure 2 (B) preserves a property of "milkiness" (semi-transparency) v.s. clearness (no surface) across the junction, and thus promotes that perceptual interpretation through the vertex. The double-reversing edges in Figure 2 (C) resist contrast-sensitive completion through the vertex, and therefore are seen as an X-junction in a single plane, rather than two edges crossing at different depths. I have not worked out the details of this explanation any farther than this initial observation, but merely suggest that Adelson's model of junction types may represent a higher-level analysis of the outcome of the combinatorial interactions of simpler collinear completion influences, which however require a spatially reified representation to allow the edges to cross in two dimensions while remaining distinct in depth. A paper at this more detailed level of analysis would almost certainly be rejected for publication at this point, since a full spatial reification would be rejected out of hand as neurally implausible, which highlights the need for a more general presentation of the principle of spatial reification justified by the larger issues of visual percpetion, before a more detailed analysis is attempted.