The new draft now incorporates this additional text:
"Of course the lightness and illuminance nodes are completely symmetrical, so the mutual inhibition results in a bistable system, where either node could potentially win the competition. Boundary completion and surface filling-in occuring separately within each of these layers would serve to unify the features in each, i.e. a lightness node that wins the competition with its corresponding illuminance node, would then also support adjacent lightness nodes to win their competition in the same manner, with the result that whole patches of the system would tend to flip or flop together, due to a spatial field- like interaction within each image. Similarly, the boundary signals in each layer might inhibit boundary signals in the other layer at the same spatial location, so that any particular boundary would tend to settle exclusively in one or the other layer, and a boundary present in a particular layer would in turn bound the diffusion of brightness signal within that layer, thus reifying the perceived surfaces corresponding to that boundary. The system as a whole would remain bistable, i.e. the entire reflectance and illuminance images depicted in Figure 9 would, in the absence of a stabilizing influence, tend to spontaneously reverse, or change places. This multistability expresses the ambiguity inherent in the stimulus of Figure 8 (A). It might be reasonable to suppose for example that a pattern of illumination is more likely to be more uniform, or be composed of larger uniform regions than the pattern of reflectance in a typical scene. This additional soft constraint could be added to the dynamic model, for example, by providing a larger brightness diffusion constant in the illuminance image than in the lightness image, i.e. an illuminance node would have a stronger influence on neighboring illuminance nodes, so that large uniform regions would be more stable in the illuminance image than in the lightness image. While this additional influence would tend to stabilize the system in one state, the system would remain essentially multistable, i.e. the images could potentially still reverse under the influence of additional forces that support the alternative perceptual interpretation."
In response to the critique: "How would such a symmetrical arrangemement lead to the generation of two structurally quite different outcomes ...? It might, but it might not, and without a simulation one just does not know." I have now incorporated the following text: (bold emphasis added here)
2.5 A General Modeling Approach
The three-layer model described here is not advanced as a specific computational model of brightness perception, for the model as described so far is not sufficiently specified to establish whether it could actually be made to work as described. The intent rather is to present a general approach to modeling in terms of a multistable dynamic system whose equilibrium states represent the final percept. The multistability of the system is the strength of the approach, because a multistable system remains sensitive to external forces, for example from other modalities (e.g. color, texture, motion, etc.) so that even a small external force can potentially flip the system from one state to another, especially when the internal forces within the module are balanced, as in the perception of ambiguous forms. And yet in each of the states, multiple local interactions reify a complete self-consistent perceptual interpretation in the manner of a Necker cube, i.e. as the reflectance and illuminance images flip, each image gets completely reified in its new location. This aspect of such dynamic systems embodies the Gestalt principle of global emergent properties from local dynamic forces. It is this general modeling strategy, rather than the specifics of any particular model that is being presented here, and illustrated with a specific example. It is immaterial whether the details of this particular implementation are right- what is at issue here is whether this kind of general modeling approach accurately reflects the nature of biological vision systems. For if this same kind of multistability were active in biological vision, then this would shed new light on our notions of low-level v.s. high-level processing, and would revise our view of the neuron, from that of an input-output "feature detector" firing in response to its tuned feature, to that of one small force amongst a multitude of tiny forces that all contribute to the final global state of the system.