PANC: Prior-Aware Normalized Cut for Object Segmentation
Juan Gutiérrez, Victor Gutiérrez-Garcia, José Luis Blanco-Murillo
TL;DR
PANC tackles the high cost of pixel-perfect segmentation by injecting a small bank of token-level priors into a dense, self-supervised ViT token graph to steer a training-free normalized-cut segmentation toward a user-specified object. By augmenting the token affinity with two anchor nodes whose connections encode foreground/background priors, PANC biases the Fiedler vector toward semantically meaningful partitions while preserving the global grouping provided by dense self-supervised features. The approach achieves state-of-the-art results among weakly supervised and unsupervised methods on benchmarks like DUTS-TE, ECSSD, and COCO-derived subsets, and it excels in homogeneous or texture-limited domains (e.g., CFD, HAM10000, CUB) where traditional saliency fails. Its contributions include a compact prior bank construction, deterministic binarization, a GPU-accelerated spectral pipeline, and comprehensive ablations that illuminate the effects of priors, anchor strength, and resolution on segmentation quality and reproducibility.
Abstract
Fully unsupervised segmentation pipelines naively seek the most salient object, should this be present. As a result, most of the methods reported in the literature deliver non-deterministic partitions that are sensitive to initialization, seed order, and threshold heuristics. We propose PANC, a weakly supervised spectral segmentation framework that uses a minimal set of annotated visual tokens to produce stable, controllable, and reproducible object masks. From the TokenCut approach, we augment the token-token affinity graph with a handful of priors coupled to anchor nodes. By manipulating the graph topology, we bias the spectral eigenspace toward partitions that are consistent with the annotations. Our approach preserves the global grouping enforced by dense self-supervised visual features, trading annotated tokens for significant gains in reproducibility, user control, and segmentation quality. Using 5 to 30 annotations per dataset, our training-free method achieves state-of-the-art performance among weakly and unsupervised approaches on standard benchmarks (e.g., DUTS-TE, ECSSD, MS COCO). Contrarily, it excels in domains where dense labels are costly or intra-class differences are subtle. We report strong and reliable results on homogeneous, fine-grained, and texture-limited domains, achieving 96.8% (+14.43% over SotA), 78.0% (+0.2%), and 78.8% (+0.37%) average mean intersection-over-union (mIoU) on CrackForest (CFD), CUB-200-2011, and HAM10000 datasets, respectively. For multi-object benchmarks, the framework showcases explicit, user-controllable semantic segmentation.
