Table of Contents
Fetching ...

Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery

Sarah Rastegar, Hazel Doughty, Cees G. M. Snoek

TL;DR

This work reconsiders the notion of a category by casting it as an optimization objective and develops InfoSieve, a self-supervised framework that learns binary-category codes forming an implicit hierarchical tree for generalized category discovery. The method combines information-theoretic objectives (via algorithmic and Shannon mutual information) with code-length minimization and supervision signals to extract compact, structured category representations from unlabeled data. Theoretical justifications link the learning of category codes to optimal trees under certain assumptions, and empirically InfoSieve achieves state-of-the-art results on fine-grained and open-world-like datasets, while providing interpretable hierarchical structure. By enabling test-time discovery of unknown categories with controllable granularity and without relying on fixed label sets, this approach offers a scalable pathway toward robust, human-agnostic categorization in real-world data.

Abstract

In the quest for unveiling novel categories at test time, we confront the inherent limitations of traditional supervised recognition models that are restricted by a predefined category set. While strides have been made in the realms of self-supervised and open-world learning towards test-time category discovery, a crucial yet often overlooked question persists: what exactly delineates a category? In this paper, we conceptualize a category through the lens of optimization, viewing it as an optimal solution to a well-defined problem. Harnessing this unique conceptualization, we propose a novel, efficient and self-supervised method capable of discovering previously unknown categories at test time. A salient feature of our approach is the assignment of minimum length category codes to individual data instances, which encapsulates the implicit category hierarchy prevalent in real-world datasets. This mechanism affords us enhanced control over category granularity, thereby equipping our model to handle fine-grained categories adeptly. Experimental evaluations, bolstered by state-of-the-art benchmark comparisons, testify to the efficacy of our solution in managing unknown categories at test time. Furthermore, we fortify our proposition with a theoretical foundation, providing proof of its optimality. Our code is available at https://github.com/SarahRastegar/InfoSieve.

Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery

TL;DR

This work reconsiders the notion of a category by casting it as an optimization objective and develops InfoSieve, a self-supervised framework that learns binary-category codes forming an implicit hierarchical tree for generalized category discovery. The method combines information-theoretic objectives (via algorithmic and Shannon mutual information) with code-length minimization and supervision signals to extract compact, structured category representations from unlabeled data. Theoretical justifications link the learning of category codes to optimal trees under certain assumptions, and empirically InfoSieve achieves state-of-the-art results on fine-grained and open-world-like datasets, while providing interpretable hierarchical structure. By enabling test-time discovery of unknown categories with controllable granularity and without relying on fixed label sets, this approach offers a scalable pathway toward robust, human-agnostic categorization in real-world data.

Abstract

In the quest for unveiling novel categories at test time, we confront the inherent limitations of traditional supervised recognition models that are restricted by a predefined category set. While strides have been made in the realms of self-supervised and open-world learning towards test-time category discovery, a crucial yet often overlooked question persists: what exactly delineates a category? In this paper, we conceptualize a category through the lens of optimization, viewing it as an optimal solution to a well-defined problem. Harnessing this unique conceptualization, we propose a novel, efficient and self-supervised method capable of discovering previously unknown categories at test time. A salient feature of our approach is the assignment of minimum length category codes to individual data instances, which encapsulates the implicit category hierarchy prevalent in real-world datasets. This mechanism affords us enhanced control over category granularity, thereby equipping our model to handle fine-grained categories adeptly. Experimental evaluations, bolstered by state-of-the-art benchmark comparisons, testify to the efficacy of our solution in managing unknown categories at test time. Furthermore, we fortify our proposition with a theoretical foundation, providing proof of its optimality. Our code is available at https://github.com/SarahRastegar/InfoSieve.
Paper Structure (37 sections, 8 theorems, 24 equations, 4 figures, 6 tables)

This paper contains 37 sections, 8 theorems, 24 equations, 4 figures, 6 tables.

Key Result

Lemma 1

For each category $c$ and for $X^i$ with $C^i{=}\mathrm{c}$, we can find a binary decision tree $\mathcal{T}_{\mathrm{c}}$ that starting from its root, reaches each $X^i$ by following the decision tree path. Based on this path, we assign code $c(X^i){=}c^i_1c^i_2\cdots c^i_M$ to each $X^i$ to unique

Figures (4)

  • Figure 1: What is the correct category? Photo of a flying fox fruit bat. This image can be categorized as bat, bird, mammal, flying bat, and other categories. How should we define which answer is correct? This paper uses self-supervision to learn an implicit category code tree that reveals different levels of granularity in the data.
  • Figure 2: The implicit binary tree our model finds to address samples. Each leaf in the tree indicates a specific sample, and each node indicates the set of its descendants' samples. For instance, the node associated with '11...11' is the set of all birds with red beaks, while its parent is the set of all birds with red parts in their upper body.
  • Figure 3: InfoSieve framework.Feature Extractor extracts an embedding by minimizing contrastive loss $\mathcal{L}_{\text{C\_in}}$. The Code Generator uses these input embeddings to find category codes. The Code Masker simultaneously learns masks that minimize the code length with $\mathcal{L}_{\text{length}}$. Finally, truncated category codes are used to minimize a contrastive loss for category codes while also predicting the seen categories by minimizing $\mathcal{L}_{\text{Cat}}$.
  • Figure 4: t-SNE plot for different embeddings in our model. (a) Feature embedding. The embedding after the projection head which is used by contrastive loss to maximize the representation information. (b) Label embedding. The embedding after generating code features is used by unsupervised contrastive loss for codes. (c) Binary embedding. The embedding by converting code features to a binary sequence using tanh activation functions and binary conditions. (d) Code embedding. The final truncated code which is generated by assigning positional values to the binary sequence and truncating the produced code using the masker network.

Theorems & Definitions (10)

  • Lemma 1
  • Definition 1
  • Theorem 1
  • Lemma 2
  • Definition 2
  • Lemma 3
  • Theorem 1
  • Theorem 2
  • Lemma 3
  • Lemma 4