Table of Contents
Fetching ...

Generalized Category Discovery via Token Manifold Capacity Learning

Luyao Tang, Kunze Huang, Chaoqi Chen, Cheng Chen

TL;DR

This work addresses Generalized Category Discovery (GCD) by tackling dimensional collapse that arises when enforcing compact clustering. It introduces Maximum Class Token Manifold Capacity (MTMC), which maximizes the nuclear norm of the class-token embedding to enlarge intra-class manifold capacity, using the ViT [cls] token as a refined sample centroid guided by patch tokens. The approach yields theoretical support via connections to von Neumann entropy and manifold capacity, and demonstrates consistent, model-agnostic improvements across coarse- and fine-grained datasets, including gains in both clustering accuracy and the accuracy of estimating the number of categories. Practically, MTMC is easy to implement (three-line loss addition) and serves as a robust, plug-and-play enhancement for open-world learning, reducing dimensional collapse and improving inter-class separability across diverse GCD frameworks.

Abstract

Generalized category discovery (GCD) is essential for improving deep learning models' robustness in open-world scenarios by clustering unlabeled data containing both known and novel categories. Traditional GCD methods focus on minimizing intra-cluster variations, often sacrificing manifold capacity, which limits the richness of intra-class representations. In this paper, we propose a novel approach, Maximum Token Manifold Capacity (MTMC), that prioritizes maximizing the manifold capacity of class tokens to preserve the diversity and complexity of data. MTMC leverages the nuclear norm of singular values as a measure of manifold capacity, ensuring that the representation of samples remains informative and well-structured. This method enhances the discriminability of clusters, allowing the model to capture detailed semantic features and avoid the loss of critical information during clustering. Through theoretical analysis and extensive experiments on coarse- and fine-grained datasets, we demonstrate that MTMC outperforms existing GCD methods, improving both clustering accuracy and the estimation of category numbers. The integration of MTMC leads to more complete representations, better inter-class separability, and a reduction in dimensional collapse, establishing MTMC as a vital component for robust open-world learning. Code is in github.com/lytang63/MTMC.

Generalized Category Discovery via Token Manifold Capacity Learning

TL;DR

This work addresses Generalized Category Discovery (GCD) by tackling dimensional collapse that arises when enforcing compact clustering. It introduces Maximum Class Token Manifold Capacity (MTMC), which maximizes the nuclear norm of the class-token embedding to enlarge intra-class manifold capacity, using the ViT [cls] token as a refined sample centroid guided by patch tokens. The approach yields theoretical support via connections to von Neumann entropy and manifold capacity, and demonstrates consistent, model-agnostic improvements across coarse- and fine-grained datasets, including gains in both clustering accuracy and the accuracy of estimating the number of categories. Practically, MTMC is easy to implement (three-line loss addition) and serves as a robust, plug-and-play enhancement for open-world learning, reducing dimensional collapse and improving inter-class separability across diverse GCD frameworks.

Abstract

Generalized category discovery (GCD) is essential for improving deep learning models' robustness in open-world scenarios by clustering unlabeled data containing both known and novel categories. Traditional GCD methods focus on minimizing intra-cluster variations, often sacrificing manifold capacity, which limits the richness of intra-class representations. In this paper, we propose a novel approach, Maximum Token Manifold Capacity (MTMC), that prioritizes maximizing the manifold capacity of class tokens to preserve the diversity and complexity of data. MTMC leverages the nuclear norm of singular values as a measure of manifold capacity, ensuring that the representation of samples remains informative and well-structured. This method enhances the discriminability of clusters, allowing the model to capture detailed semantic features and avoid the loss of critical information during clustering. Through theoretical analysis and extensive experiments on coarse- and fine-grained datasets, we demonstrate that MTMC outperforms existing GCD methods, improving both clustering accuracy and the estimation of category numbers. The integration of MTMC leads to more complete representations, better inter-class separability, and a reduction in dimensional collapse, establishing MTMC as a vital component for robust open-world learning. Code is in github.com/lytang63/MTMC.

Paper Structure

This paper contains 30 sections, 4 theorems, 16 equations, 7 figures, 4 tables.

Key Result

Theorem 1

For a given [cls] autocorrelation $\mathcal{A} =\mathbf{CLS}^{\top} \mathbf{CLS} / N \in \mathbb{R}^{d \times d}$ of rank $k$ ($\leq d$), where equality holds if the eigenvalues of $\mathcal{A}$ are uniform with $\forall_{j=1}^k \lambda_j=1 / k$ and $\forall_{j=k+1}^d \lambda_j=0$ .

Figures (7)

  • Figure 1: (a) GCD is constrained by dimensional collapse due to strong clustering, leading to mixed class features and limited representational capacity. (b) MTMC enhances the class token manifold capacity, improving representational completeness and unlocking the model's full potential in the open world.
  • Figure 2: Overview of Maximum Token Manifold Capacity.
  • Figure 3: Comparison between $log(\operatorname{rank}(\mathcal{A}))$ and $\hat{H}(\mathcal{A})$. The count of the largest eigenvalues necessary to account for 99% of the total eigenvalue energy serves as a surrogate for the rank.
  • Figure 4: Hyperparameter sensitivity of the degree of MTMC $\lambda$ and features dimensionality $D$.
  • Figure 5: The Frobenius norm $\left\|\mathcal{A}-c \cdot I_d\right\|_F^2$ on three fine-grained benchmarks.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Theorem 1
  • Lemma 1
  • Proof B.1
  • Lemma 2
  • Proof B.2
  • Theorem 2
  • Proof B.3