CLUSTSEG: Clustering for Universal Segmentation
James Liang, Tianfei Zhou, Dongfang Liu, Wenguan Wang
TL;DR
CLUSTSEG presents a universal, transformer-based framework that unifies superpixel, semantic, instance, and panoptic segmentation by recasting segmentation as iterative clustering. It introduces task-aware Dreamy-Start initialization and a nonparametric Recurrent Cross-Attention mechanism that performs EM-like cluster updates without extra learnable parameters, enabling transparent and effective pixel clustering. Across panoptic, instance, semantic, and superpixel benchmarks, CLUSTSEG achieves state-of-the-art or competitive results and the ablations confirm the critical roles of initialization and recursive clustering. The approach offers a flexible, architecture-agnostic pathway toward unified dense prediction with strong practical implications for large-scale visual understanding.
Abstract
We present CLUSTSEG, a general, transformer-based framework that tackles different image segmentation tasks (i.e., superpixel, semantic, instance, and panoptic) through a unified neural clustering scheme. Regarding queries as cluster centers, CLUSTSEG is innovative in two aspects:1) cluster centers are initialized in heterogeneous ways so as to pointedly address task-specific demands (e.g., instance- or category-level distinctiveness), yet without modifying the architecture; and 2) pixel-cluster assignment, formalized in a cross-attention fashion, is alternated with cluster center update, yet without learning additional parameters. These innovations closely link CLUSTSEG to EM clustering and make it a transparent and powerful framework that yields superior results across the above segmentation tasks.
