Table of Contents
Fetching ...

Patch-Based Deep Unsupervised Image Segmentation using Graph Cuts

Isaac Wasserman, Jeova Farias Sales Rocha Neto

TL;DR

GraPL tackles unsupervised pixel-level image segmentation by learning a patch-level CNN classifier guided by an iterative graph-cut energy. It alternates between patch-label optimization via min-st-cut and gradient updates of the network parameters, yielding a fully convolutional segmenter without labeled data. The method can leverage pretrained patch embeddings (e.g., DINOv2) as affinity cues while relying on graph cuts for regularization, achieving state-of-the-art performance on BSDS500 (mIoU ≈ 0.53, accuracy ≈ 0.57). This single-image, postprocessing-free framework highlights the value of integrating graph-based regularization into deep patch-based segmentation.

Abstract

Unsupervised image segmentation aims at grouping different semantic patterns in an image without the use of human annotation. Similarly, image clustering searches for groupings of images based on their semantic content without supervision. Classically, both problems have captivated researchers as they drew from sound mathematical concepts to produce concrete applications. With the emergence of deep learning, the scientific community turned its attention to complex neural network-based solvers that achieved impressive results in those domains but rarely leveraged the advances made by classical methods. In this work, we propose a patch-based unsupervised image segmentation strategy that bridges advances in unsupervised feature extraction from deep clustering methods with the algorithmic help of classical graph-based methods. We show that a simple convolutional neural network, trained to classify image patches and iteratively regularized using graph cuts, naturally leads to a state-of-the-art fully-convolutional unsupervised pixel-level segmenter. Furthermore, we demonstrate that this is the ideal setting for leveraging the patch-level pairwise features generated by vision transformer models. Our results on real image data demonstrate the effectiveness of our proposed methodology.

Patch-Based Deep Unsupervised Image Segmentation using Graph Cuts

TL;DR

GraPL tackles unsupervised pixel-level image segmentation by learning a patch-level CNN classifier guided by an iterative graph-cut energy. It alternates between patch-label optimization via min-st-cut and gradient updates of the network parameters, yielding a fully convolutional segmenter without labeled data. The method can leverage pretrained patch embeddings (e.g., DINOv2) as affinity cues while relying on graph cuts for regularization, achieving state-of-the-art performance on BSDS500 (mIoU ≈ 0.53, accuracy ≈ 0.57). This single-image, postprocessing-free framework highlights the value of integrating graph-based regularization into deep patch-based segmentation.

Abstract

Unsupervised image segmentation aims at grouping different semantic patterns in an image without the use of human annotation. Similarly, image clustering searches for groupings of images based on their semantic content without supervision. Classically, both problems have captivated researchers as they drew from sound mathematical concepts to produce concrete applications. With the emergence of deep learning, the scientific community turned its attention to complex neural network-based solvers that achieved impressive results in those domains but rarely leveraged the advances made by classical methods. In this work, we propose a patch-based unsupervised image segmentation strategy that bridges advances in unsupervised feature extraction from deep clustering methods with the algorithmic help of classical graph-based methods. We show that a simple convolutional neural network, trained to classify image patches and iteratively regularized using graph cuts, naturally leads to a state-of-the-art fully-convolutional unsupervised pixel-level segmenter. Furthermore, we demonstrate that this is the ideal setting for leveraging the patch-level pairwise features generated by vision transformer models. Our results on real image data demonstrate the effectiveness of our proposed methodology.
Paper Structure (25 sections, 8 equations, 6 figures, 4 tables)

This paper contains 25 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The proposed algorithm. GraPL trains a convolutional neural network to cluster patches of a single image without supervision under the guidance of graph cuts, spatial continuity loss, and a patch affinity encoder. At inference, this patch clustering knowledge is applied to pixel-level segmentation of the image. $F'_\theta$ and $F_\theta$ share the same parameters.
  • Figure 2: Example of undersegmentation from non-SLIC initialization. (a) Input image. (b) Patchwise Random ($\hat{K}=2$). (c) Seedwise Random ($\hat{K}=4$). (d) Spatial Clustering ($\hat{K}=4$). (e) SLIC ($\hat{K}=6$).
  • Figure 3: Comparison of loss curves using warm and cold starting methods, averaged over all test images in BSDS500 arbelaez2011bsds. Here we consider the loss value at the end of each gradient step. On the $x$-axis, we depict the instants where a new training iteration starts.
  • Figure 4: Effects of pairwise energy coefficient. (a) Effect of $\lambda$ on mIoU. (b) Effect of $\lambda$ on $\Delta K$.
  • Figure 5: Effect of spatial continuity loss weight on mIoU
  • ...and 1 more figures