Table of Contents
Fetching ...

Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grouping

Ben Lonnqvist, Zhengqing Wu, Michael H. Herzog

TL;DR

This work mathematically demonstrates that under realistic assumptions, neural noise can be used to separate objects from each other, and shows that adding noise in a DNN enables the network to segment images even though it was never trained on any segmentation labels.

Abstract

Humans are able to segment images effortlessly without supervision using perceptual grouping. Here, we propose a counter-intuitive computational approach to solving unsupervised perceptual grouping and segmentation: that they arise because of neural noise, rather than in spite of it. We (1) mathematically demonstrate that under realistic assumptions, neural noise can be used to separate objects from each other; (2) that adding noise in a DNN enables the network to segment images even though it was never trained on any segmentation labels; and (3) that segmenting objects using noise results in segmentation performance that aligns with the perceptual grouping phenomena observed in humans, and is sample-efficient. We introduce the Good Gestalt (GG) datasets -- six datasets designed to specifically test perceptual grouping, and show that our DNN models reproduce many important phenomena in human perception, such as illusory contours, closure, continuity, proximity, and occlusion. Finally, we (4) show that our model improves performance on our GG datasets compared to other tested unsupervised models by $24.9\%$. Together, our results suggest a novel unsupervised segmentation method requiring few assumptions, a new explanation for the formation of perceptual grouping, and a novel potential benefit of neural noise.

Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grouping

TL;DR

This work mathematically demonstrates that under realistic assumptions, neural noise can be used to separate objects from each other, and shows that adding noise in a DNN enables the network to segment images even though it was never trained on any segmentation labels.

Abstract

Humans are able to segment images effortlessly without supervision using perceptual grouping. Here, we propose a counter-intuitive computational approach to solving unsupervised perceptual grouping and segmentation: that they arise because of neural noise, rather than in spite of it. We (1) mathematically demonstrate that under realistic assumptions, neural noise can be used to separate objects from each other; (2) that adding noise in a DNN enables the network to segment images even though it was never trained on any segmentation labels; and (3) that segmenting objects using noise results in segmentation performance that aligns with the perceptual grouping phenomena observed in humans, and is sample-efficient. We introduce the Good Gestalt (GG) datasets -- six datasets designed to specifically test perceptual grouping, and show that our DNN models reproduce many important phenomena in human perception, such as illusory contours, closure, continuity, proximity, and occlusion. Finally, we (4) show that our model improves performance on our GG datasets compared to other tested unsupervised models by . Together, our results suggest a novel unsupervised segmentation method requiring few assumptions, a new explanation for the formation of perceptual grouping, and a novel potential benefit of neural noise.
Paper Structure (34 sections, 11 equations, 28 figures, 4 tables, 1 algorithm)

This paper contains 34 sections, 11 equations, 28 figures, 4 tables, 1 algorithm.

Figures (28)

  • Figure 1: Latent Noise Segmentation Schematic Illustration.(a) Biological neurons are highly noisy. For example, thermal noise and ion channel shot noise manwani1998 contribute to independent noise in neurons. (b) Independent noise affects the output of neurons that are highly selective to a stimulus feature (left) than neurons that are less selective (right). Solid lines indicate the mean of the noise-free activity distribution, and dashed lines indicate the actual sample after independent noise is added. The x-axis in the illustration is not meaningful.(c) In the system’s representational space (as indicated by the surface where input images are mapped to points on that surface), the changes caused by independent noise cause meaningful changes to the model’s representation in relevant directions, e.g., local Principal Component (PC) directions, but not irrelevant directions (that would substantially change the model's representation of the input). (d) This yields information about the objects in the input image, and can be used to segment the input images. An input image $\mathbf{x}^{(i)}$ is fed to an autoencoder network, and noisy samples are drawn and consecutively subtracted from each other. These outputs contain information about the changes induced by noise in latent space, cast into image space. The outputs are stacked and clustered pixel-wise to generate a segmentation mask.
  • Figure 2: The Good Gestalt (GG) Datasets. Zoom in for an ideal viewing experience. The first two rows of images show training image examples, while the second two rows show images of testing examples.
  • Figure 3: Target segmentation mask examples of the GG datasets. The first row of images shows training image examples, while the second row shows images of testing examples. Different colors indicate different objects. Kanizsa Squares: The model should segment a square in the center, and four background circles separately from the background. Closure: The model should segment a square. Continuity: The model should segment a circle traced by the relevant line segments to "complete the circle". Proximity: The model should segment a set of six squares together, or three sets of two squares when the proximity cue is given. Gradient Occlusion: The model should segment the two rectangles separately. Illusory Occlusion: The model should segment the static background and stripes together, and the foreground object parts together.
  • Figure 4: Model Output Segmentation Mask Examples. The first row shows inputs, the second row shows VAE segmentation masks, and the third row shows AE segmentation masks. Randomly selected examples from the Kanizsa Squares, Closure, Continuity, Proximity, Gradient Occlusion, and Illusory Occlusion datasets are shown using the best model hyperparameters shown in Table \ref{['table1']}. Since the model segments by clustering noisy outputs, the specific color assignment to different object identities is arbitrary (for example, whether the model assigns the identity represented by yellow as the background, or purple, is not a meaningful distinction).
  • Figure 5: The optimal noise level for the AE and the VAE across datasets. The AE performs best with a larger range of noise values, while the VAE consistently prefers low noise values.
  • ...and 23 more figures