Table of Contents
Fetching ...

PANC: Prior-Aware Normalized Cut for Object Segmentation

Juan Gutiérrez, Victor Gutiérrez-Garcia, José Luis Blanco-Murillo

TL;DR

PANC tackles the high cost of pixel-perfect segmentation by injecting a small bank of token-level priors into a dense, self-supervised ViT token graph to steer a training-free normalized-cut segmentation toward a user-specified object. By augmenting the token affinity with two anchor nodes whose connections encode foreground/background priors, PANC biases the Fiedler vector toward semantically meaningful partitions while preserving the global grouping provided by dense self-supervised features. The approach achieves state-of-the-art results among weakly supervised and unsupervised methods on benchmarks like DUTS-TE, ECSSD, and COCO-derived subsets, and it excels in homogeneous or texture-limited domains (e.g., CFD, HAM10000, CUB) where traditional saliency fails. Its contributions include a compact prior bank construction, deterministic binarization, a GPU-accelerated spectral pipeline, and comprehensive ablations that illuminate the effects of priors, anchor strength, and resolution on segmentation quality and reproducibility.

Abstract

Fully unsupervised segmentation pipelines naively seek the most salient object, should this be present. As a result, most of the methods reported in the literature deliver non-deterministic partitions that are sensitive to initialization, seed order, and threshold heuristics. We propose PANC, a weakly supervised spectral segmentation framework that uses a minimal set of annotated visual tokens to produce stable, controllable, and reproducible object masks. From the TokenCut approach, we augment the token-token affinity graph with a handful of priors coupled to anchor nodes. By manipulating the graph topology, we bias the spectral eigenspace toward partitions that are consistent with the annotations. Our approach preserves the global grouping enforced by dense self-supervised visual features, trading annotated tokens for significant gains in reproducibility, user control, and segmentation quality. Using 5 to 30 annotations per dataset, our training-free method achieves state-of-the-art performance among weakly and unsupervised approaches on standard benchmarks (e.g., DUTS-TE, ECSSD, MS COCO). Contrarily, it excels in domains where dense labels are costly or intra-class differences are subtle. We report strong and reliable results on homogeneous, fine-grained, and texture-limited domains, achieving 96.8% (+14.43% over SotA), 78.0% (+0.2%), and 78.8% (+0.37%) average mean intersection-over-union (mIoU) on CrackForest (CFD), CUB-200-2011, and HAM10000 datasets, respectively. For multi-object benchmarks, the framework showcases explicit, user-controllable semantic segmentation.

PANC: Prior-Aware Normalized Cut for Object Segmentation

TL;DR

PANC tackles the high cost of pixel-perfect segmentation by injecting a small bank of token-level priors into a dense, self-supervised ViT token graph to steer a training-free normalized-cut segmentation toward a user-specified object. By augmenting the token affinity with two anchor nodes whose connections encode foreground/background priors, PANC biases the Fiedler vector toward semantically meaningful partitions while preserving the global grouping provided by dense self-supervised features. The approach achieves state-of-the-art results among weakly supervised and unsupervised methods on benchmarks like DUTS-TE, ECSSD, and COCO-derived subsets, and it excels in homogeneous or texture-limited domains (e.g., CFD, HAM10000, CUB) where traditional saliency fails. Its contributions include a compact prior bank construction, deterministic binarization, a GPU-accelerated spectral pipeline, and comprehensive ablations that illuminate the effects of priors, anchor strength, and resolution on segmentation quality and reproducibility.

Abstract

Fully unsupervised segmentation pipelines naively seek the most salient object, should this be present. As a result, most of the methods reported in the literature deliver non-deterministic partitions that are sensitive to initialization, seed order, and threshold heuristics. We propose PANC, a weakly supervised spectral segmentation framework that uses a minimal set of annotated visual tokens to produce stable, controllable, and reproducible object masks. From the TokenCut approach, we augment the token-token affinity graph with a handful of priors coupled to anchor nodes. By manipulating the graph topology, we bias the spectral eigenspace toward partitions that are consistent with the annotations. Our approach preserves the global grouping enforced by dense self-supervised visual features, trading annotated tokens for significant gains in reproducibility, user control, and segmentation quality. Using 5 to 30 annotations per dataset, our training-free method achieves state-of-the-art performance among weakly and unsupervised approaches on standard benchmarks (e.g., DUTS-TE, ECSSD, MS COCO). Contrarily, it excels in domains where dense labels are costly or intra-class differences are subtle. We report strong and reliable results on homogeneous, fine-grained, and texture-limited domains, achieving 96.8% (+14.43% over SotA), 78.0% (+0.2%), and 78.8% (+0.37%) average mean intersection-over-union (mIoU) on CrackForest (CFD), CUB-200-2011, and HAM10000 datasets, respectively. For multi-object benchmarks, the framework showcases explicit, user-controllable semantic segmentation.
Paper Structure (65 sections, 3 equations, 18 figures, 6 tables, 1 algorithm)

This paper contains 65 sections, 3 equations, 18 figures, 6 tables, 1 algorithm.

Figures (18)

  • Figure 1: Schematic on the proposed PANC framework. To segment an input image (top left) NCut algorithm (center) relies on building the affinity graph. The result (top right) is largely uncontrolled in terms of the segmented "salient" objects and the labels assigned to these. Our solution (bottom center) introduces a minimal set labels (bottom left) as annotated priors directly into the affinity graph. Segmentation on the augmented graph focused on classes exemlified by the priors and fosters consistency in spectral partitioning.
  • Figure 2: Overview of the proposed pipeline. The input image is tokenized using ViT. We use the resulting representation to retrieve suitable priors from the annotated bank, and to build the affinity matrix. The new extended affinity matrix is processed in the same was as in TokenCut, to produce the final segmentation mask.
  • Figure 3: Examples on the ECSSD benchmark. Columns display the input image, the eigenvector-based attention map (normalized Fiedler scores visualized as a heatmap), and the binarized mask, for a different sample image (by rows).
  • Figure 4: Visual examples of PANC's class-selective controllability on MS COCO. For each input, we show two results: segmentation using 'dog' priors (top row) and 'person' priors (bottom row). The Eigen Attention map and final Mask correctly shift focus to the target class specified by the priors, demonstrating PANC's ability to control segmentation in multi-object scenes.
  • Figure 5: Qualitative comparison on challenging specialized datasets (HAM, CFD, CUB). This figure illustrates PANC's robustness on homogeneous and low-semantic-content images where unsupervised baselines fail.
  • ...and 13 more figures