CLUSTSEG: Clustering for Universal Segmentation

James Liang; Tianfei Zhou; Dongfang Liu; Wenguan Wang

CLUSTSEG: Clustering for Universal Segmentation

James Liang, Tianfei Zhou, Dongfang Liu, Wenguan Wang

TL;DR

CLUSTSEG presents a universal, transformer-based framework that unifies superpixel, semantic, instance, and panoptic segmentation by recasting segmentation as iterative clustering. It introduces task-aware Dreamy-Start initialization and a nonparametric Recurrent Cross-Attention mechanism that performs EM-like cluster updates without extra learnable parameters, enabling transparent and effective pixel clustering. Across panoptic, instance, semantic, and superpixel benchmarks, CLUSTSEG achieves state-of-the-art or competitive results and the ablations confirm the critical roles of initialization and recursive clustering. The approach offers a flexible, architecture-agnostic pathway toward unified dense prediction with strong practical implications for large-scale visual understanding.

Abstract

We present CLUSTSEG, a general, transformer-based framework that tackles different image segmentation tasks (i.e., superpixel, semantic, instance, and panoptic) through a unified neural clustering scheme. Regarding queries as cluster centers, CLUSTSEG is innovative in two aspects:1) cluster centers are initialized in heterogeneous ways so as to pointedly address task-specific demands (e.g., instance- or category-level distinctiveness), yet without modifying the architecture; and 2) pixel-cluster assignment, formalized in a cross-attention fashion, is alternated with cluster center update, yet without learning additional parameters. These innovations closely link CLUSTSEG to EM clustering and make it a transparent and powerful framework that yields superior results across the above segmentation tasks.

CLUSTSEG: Clustering for Universal Segmentation

TL;DR

Abstract

Paper Structure (25 sections, 10 equations, 11 figures, 7 tables, 2 algorithms)

This paper contains 25 sections, 10 equations, 11 figures, 7 tables, 2 algorithms.

Introduction
Related Work
Methodology
Notation and Preliminary
ClustSeg
Implementation Details
Experiment
Experiment on Panoptic Segmentation
Experiment on Instance Segmentation
Experiment on Semantic Segmentation
Experiment on Superpixel Segmentation
Diagnostic Experiment
Conclusion
More Experimental Details
Panoptic Segmentation
...and 10 more sections

Figures (11)

Figure 1: ClustSeg unifies four segmentation tasks (i.e., superpixel, semantic, instance, and panoptic) from the clustering view, and greatly suppresses existing specialized and unified models.
Figure 2: Dreamy-Start$_{\!}$ for$_{\!}$ query$_{\!}$ initialization.$_{\!}$ (a)$_{\!}$ To$_{\!}$ respect$_{\!}$ the$_{\!}$ cross-scene$_{\!}$ semantically$_{\!}$ consistent$_{\!}$ nature$_{\!}$ of$_{\!}$ semantic/stuff$_{\!}$ segmentation, the$_{\!}$ quries/seeds$_{\!}$ are$_{\!}$ initialized$_{\!}$ as$_{\!}$ class$_{\!}$ centers$_{\!}$ (Eq.$_{\!}$\ref{['eq:stuffquery']}).$_{\!}$ (b)$_{\!}$ To$_{\!}$ meet$_{\!}$ the$_{\!}$ instance-aware$_{\!}$ demand$_{\!}$ of$_{\!}$ instance/thing$_{\!}$ segmentation,$_{\!}$ the$_{\!}$ initial$_{\!}$ seeds are$_{\!}$ emerged$_{\!}$ from$_{\!}$ the$_{\!}$ input$_{\!}$ image$_{\!}$ (Eq.$_{\!}$\ref{['eq:thingquery']}). (c)$_{\!}$ To$_{\!}$ generate$_{\!}$ varying$_{\!}$ number$_{\!}$ of$_{\!}$ superpixels,$_{\!}$ the$_{\!}$ seeds$_{\!}$ are$_{\!}$ initialized$_{\!}$ from$_{\!}$ image$_{\!}$ grids$_{\!}$ (Eq.$_{\!}$\ref{['eq:superpixelquery']}).
Figure 3: (a) Recurrent Cross-attention instantiates EM clustering for segment-by-clustering. (b) Each Recurrent Cross-attention layer executes $T$ iterations of clustering assignment (E-step) and center update (M-step). (c) Overall architecture of ClustSeg.
Figure 4: ClustSeg reaches the best ASA and CO scores on BSDS500 arbelaez2011contourtest, among all the deep learning based superpixel models (see §\ref{['sec:SuS']} for details).
Figure 5: ClustSeg reaches the best ASA and CO scores on NYUv2 silberman2012indoortest (see §\ref{['sec:sup']} for details).
...and 6 more figures

CLUSTSEG: Clustering for Universal Segmentation

TL;DR

Abstract

CLUSTSEG: Clustering for Universal Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)