Table of Contents
Fetching ...

Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation

Xiao Zhang, Xiangyu Han, Xiwen Lai, Yao Sun, Pei Zhang, Konrad Kording

TL;DR

Falcon tackles suboptimal unsupervised image segmentation by replacing recursive two-way Normalized Cut with a regularized, parallelizable K-way Normalized Cut on transformer-derived tokens. Using a fractional quadratic transform and dynamic affinity regularization, it optimizes all clusters simultaneously, followed by a depth-aware DREAM refinement that fuses RGB and depth cues for high-precision masks. Across six benchmarks, Falcon achieves state-of-the-art mean IoU, notably improving Cityscapes and COCO-Stuff-27, while reducing inference time by around 30% on a single RTX4090, demonstrating strong scalability. By effectively leveraging semantic information in foundation-model attention within a graph-cut framework, Falcon narrows the gap to supervised segmentation and supports scalable, real-world dense prediction pre-training. Code is released at the provided repository.

Abstract

Today's unsupervised image segmentation algorithms often segment suboptimally. Modern graph-cut based approaches rely on high-dimensional attention maps from Transformer-based foundation models, typically employing a relaxed Normalized Cut solved recursively via the Fiedler vector (the eigenvector of the second smallest eigenvalue). Consequently, they still lag behind supervised methods in both mask generation speed and segmentation accuracy. We present a regularized fractional alternating cut (Falcon), an optimization-based K-way Normalized Cut without relying on recursive eigenvector computations, achieving substantially improved speed and accuracy. Falcon operates in two stages: (1) a fast K-way Normalized Cut solved by extending into a fractional quadratic transformation, with an alternating iterative procedure and regularization to avoid local minima; and (2) refinement of the resulting masks using complementary low-level information, producing high-quality pixel-level segmentations. Experiments show that Falcon not only surpasses existing state-of-the-art methods by an average of 2.5% across six widely recognized benchmarks (reaching up to 4.3\% improvement on Cityscapes), but also reduces runtime by around 30% compared to prior graph-based approaches. These findings demonstrate that the semantic information within foundation-model attention can be effectively harnessed by a highly parallelizable graph cut framework. Consequently, Falcon can narrow the gap between unsupervised and supervised segmentation, enhancing scalability in real-world applications and paving the way for dense prediction-based vision pre-training in various downstream tasks. The code is released in https://github.com/KordingLab/Falcon.

Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation

TL;DR

Falcon tackles suboptimal unsupervised image segmentation by replacing recursive two-way Normalized Cut with a regularized, parallelizable K-way Normalized Cut on transformer-derived tokens. Using a fractional quadratic transform and dynamic affinity regularization, it optimizes all clusters simultaneously, followed by a depth-aware DREAM refinement that fuses RGB and depth cues for high-precision masks. Across six benchmarks, Falcon achieves state-of-the-art mean IoU, notably improving Cityscapes and COCO-Stuff-27, while reducing inference time by around 30% on a single RTX4090, demonstrating strong scalability. By effectively leveraging semantic information in foundation-model attention within a graph-cut framework, Falcon narrows the gap to supervised segmentation and supports scalable, real-world dense prediction pre-training. Code is released at the provided repository.

Abstract

Today's unsupervised image segmentation algorithms often segment suboptimally. Modern graph-cut based approaches rely on high-dimensional attention maps from Transformer-based foundation models, typically employing a relaxed Normalized Cut solved recursively via the Fiedler vector (the eigenvector of the second smallest eigenvalue). Consequently, they still lag behind supervised methods in both mask generation speed and segmentation accuracy. We present a regularized fractional alternating cut (Falcon), an optimization-based K-way Normalized Cut without relying on recursive eigenvector computations, achieving substantially improved speed and accuracy. Falcon operates in two stages: (1) a fast K-way Normalized Cut solved by extending into a fractional quadratic transformation, with an alternating iterative procedure and regularization to avoid local minima; and (2) refinement of the resulting masks using complementary low-level information, producing high-quality pixel-level segmentations. Experiments show that Falcon not only surpasses existing state-of-the-art methods by an average of 2.5% across six widely recognized benchmarks (reaching up to 4.3\% improvement on Cityscapes), but also reduces runtime by around 30% compared to prior graph-based approaches. These findings demonstrate that the semantic information within foundation-model attention can be effectively harnessed by a highly parallelizable graph cut framework. Consequently, Falcon can narrow the gap between unsupervised and supervised segmentation, enhancing scalability in real-world applications and paving the way for dense prediction-based vision pre-training in various downstream tasks. The code is released in https://github.com/KordingLab/Falcon.

Paper Structure

This paper contains 13 sections, 3 theorems, 52 equations, 6 figures, 2 tables, 2 algorithms.

Key Result

Lemma 5.1

For $a > 0$ and $b > 0$, the following equality holds:

Figures (6)

  • Figure 1: Recursive N-Cut vs Falcon (ours). Our method addresses the graph cut problem by alternately optimizing and regularizing both the soft assignment matrix and the affinity matrix, distinguishing it from recursive N-Cut methods.
  • Figure 2: Falcon Visual Segmentation Comparison. Our Falcon method employs a fractional alternating optimized n‑Cut strategy, enhanced by multiple regularization techniques that effectively overcome local minima. Compared to DiffCut, Falcon reveals higher degree of fine details, for example, in the first image it distinguishes the car front, lanes, distant trees, and high-rise structures; in the second, it segments the intricate details of the train body; in the third, it separates billboards from rooftops; in the fourth, it isolates a child resting on a tractor; in the fifth, it clearly differentiates the castle’s surrounding walls and vines; in the sixth, it extracts items within a cabinet; in the final image, virtual environment with complex lighting conditions, Falcon robustly segments the complete human figure.
  • Figure 3: Overview of Falcon. (1) Image feature extraction: we extract tokens and depth map from the input image. (2) Alternating Cut: we construct affinity matrix between tokens and alternately optimize it with the soft assignment matrix. (3) Depth-aware Adaptive Mask Refinement: depth map and original RGB image flows into the similarity weights fusion module and produce the weights which is used to iterative refine the mask assignment obtained from Alternating Cut step.
  • Figure 4: Depth-aware Non-linear Adaptive Mask Refinement (DREAM). RGB and depth Similarity weight matrix are constructed based on the affinity between current pixel and its neighbors, and are fused through the blending weights. Then the mask label iteratively updates based on the fused weights.
  • Figure 5: The comparison of total evaluation time on various datasets. Falcon can shorten about 30% inference time than recursive N-Cut on a single RTX4090.
  • ...and 1 more figures

Theorems & Definitions (6)

  • Lemma 5.1: Quadratic Transform for Ratios
  • proof
  • Theorem 5.3: Monotonic Convergence
  • proof
  • Definition 5.4: Assignment-Consistent Reweighting
  • Proposition 5.5: Structural Invariance