Table of Contents
Fetching ...

Utilization of Neighbor Information for Image Classification with Different Levels of Supervision

Gihan Jayatilaka, Abhinav Shrivastava, Matthew Gwilliam

TL;DR

The paper tackles the gap between fully supervised, semi-supervised (GCD), and unsupervised image recognition by proposing UNIC, a neighbor-information–driven framework that unifies clustering and GCD. It leverages a DINO-based ViT backbone to mine positive and negative neighbors and finetunes end-to-end with neighbor-aware losses, adapting naturally to GCD by using ground-truth neighbors for labelled classes. A novel second-order neighbor cleaning strategy and a dedicated negative-neighbor mining component enable effective clustering with a single clustering head, achieving state-of-the-art results on ImageNet-100, ImageNet-200, CUB-200, Aircrafts, and SCars for both clustering and GCD. The approach demonstrates strong open-world recognition potential, showing that carefully harnessed neighbor information can bridge supervised and unsupervised learning in image classification, with practical implications for scenarios with varying levels of labeling.

Abstract

We propose to bridge the gap between semi-supervised and unsupervised image recognition with a flexible method that performs well for both generalized category discovery (GCD) and image clustering. Despite the overlap in motivation between these tasks, the methods themselves are restricted to a single task -- GCD methods are reliant on the labeled portion of the data, and deep image clustering methods have no built-in way to leverage the labels efficiently. We connect the two regimes with an innovative approach that Utilizes Neighbor Information for Classification (UNIC) both in the unsupervised (clustering) and semisupervised (GCD) setting. State-of-the-art clustering methods already rely heavily on nearest neighbors. We improve on their results substantially in two parts, first with a sampling and cleaning strategy where we identify accurate positive and negative neighbors, and secondly by finetuning the backbone with clustering losses computed by sampling both types of neighbors. We then adapt this pipeline to GCD by utilizing the labelled images as ground truth neighbors. Our method yields state-of-the-art results for both clustering (+3% ImageNet-100, Imagenet200) and GCD (+0.8% ImageNet-100, +5% CUB, +2% SCars, +4% Aircraft).

Utilization of Neighbor Information for Image Classification with Different Levels of Supervision

TL;DR

The paper tackles the gap between fully supervised, semi-supervised (GCD), and unsupervised image recognition by proposing UNIC, a neighbor-information–driven framework that unifies clustering and GCD. It leverages a DINO-based ViT backbone to mine positive and negative neighbors and finetunes end-to-end with neighbor-aware losses, adapting naturally to GCD by using ground-truth neighbors for labelled classes. A novel second-order neighbor cleaning strategy and a dedicated negative-neighbor mining component enable effective clustering with a single clustering head, achieving state-of-the-art results on ImageNet-100, ImageNet-200, CUB-200, Aircrafts, and SCars for both clustering and GCD. The approach demonstrates strong open-world recognition potential, showing that carefully harnessed neighbor information can bridge supervised and unsupervised learning in image classification, with practical implications for scenarios with varying levels of labeling.

Abstract

We propose to bridge the gap between semi-supervised and unsupervised image recognition with a flexible method that performs well for both generalized category discovery (GCD) and image clustering. Despite the overlap in motivation between these tasks, the methods themselves are restricted to a single task -- GCD methods are reliant on the labeled portion of the data, and deep image clustering methods have no built-in way to leverage the labels efficiently. We connect the two regimes with an innovative approach that Utilizes Neighbor Information for Classification (UNIC) both in the unsupervised (clustering) and semisupervised (GCD) setting. State-of-the-art clustering methods already rely heavily on nearest neighbors. We improve on their results substantially in two parts, first with a sampling and cleaning strategy where we identify accurate positive and negative neighbors, and secondly by finetuning the backbone with clustering losses computed by sampling both types of neighbors. We then adapt this pipeline to GCD by utilizing the labelled images as ground truth neighbors. Our method yields state-of-the-art results for both clustering (+3% ImageNet-100, Imagenet200) and GCD (+0.8% ImageNet-100, +5% CUB, +2% SCars, +4% Aircraft).

Paper Structure

This paper contains 32 sections, 9 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Unifying Clustering and GCD. We observe that the goals of image clustering and generalized category discovery (GCD) are identical, they only differ slightly in terms of supervision (top). Therefore, we propose a clustering approach based on mining of positive and negative neighbors, which belong to the same class as an anchor and a different class, respectively (bottom left). We can extend this approach for GCD by using the ground truth labels for "perfect" neighbors (bottom right).
  • Figure 2: Neighbor Mining for UNIC. We extract features from a backbone, take the closest samples as "positive" neighbors, and some of the far samples for "negative" neighbors. We then prune some positive neighbors, depending on the number of mutual nearest neighbors (union of nearest neighbors of nearest neighbors).
  • Figure 3: UNIC. We first mine neighbors (see Figure \ref{['fig:mining-diagram']}). We finetune the backbone with a classification head that we train without labels, using losses that encourage the model to predict the same class for positive neighbors, different classes for negative neighbors, and entropy for regularization.
  • Figure 4: True positives vs. second-order neighborhood size. A threshold of 70 significantly reduces false positives (red) while conserving usable data points (blue).
  • Figure 5: Convergence behavior for clustering on ImageNet-50. We compare batch sizes, finetuning levels (frozen, full-finetune, last block), and clustering heads (MLP, fully-connected, self-attention).
  • ...and 11 more figures