Table of Contents
Fetching ...

Learning from Neighbors: Category Extrapolation for Long-Tail Learning

Shizhen Zhao, Xin Wen, Jiahui Liu, Chuofan Ma, Chunfeng Yuan, Xiaojuan Qi

TL;DR

The paper tackles long-tail visual recognition by increasing data granularity through category extrapolation: introducing open-set auxiliary categories that are visually related to target classes. An automated pipeline uses LLMs to identify fine-grained neighbor categories and web crawling to collect images, followed by a neighbor-silencing loss to prevent auxiliary data from overshadowing the target task. At inference, auxiliary weights are masked, preserving only target-class predictions. Across ImageNet-LT, iNaturalist 2018, and Places-LT, the approach yields consistent improvements over BalCE baselines under multiple pretraining regimes, often achieving substantial gains on tail categories and offering robust generalization without additional classifier rebalancing.

Abstract

Balancing training on long-tail data distributions remains a long-standing challenge in deep learning. While methods such as re-weighting and re-sampling help alleviate the imbalance issue, limited sample diversity continues to hinder models from learning robust and generalizable feature representations, particularly for tail classes. In contrast to existing methods, we offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance. In this paper, we investigate this phenomenon through both quantitative and qualitative studies, showing that increased granularity enhances the generalization of learned features in tail categories. Motivated by these findings, we propose a method to increase dataset granularity through category extrapolation. Specifically, we introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes. This forms the core contribution and insight of our approach. To automate the curation of auxiliary data, we leverage large language models (LLMs) as knowledge bases to search for auxiliary categories and retrieve relevant images through web crawling. To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss that encourages the model to focus on class discrimination within the target dataset. During inference, the classifier weights for auxiliary categories are masked out, leaving only the target class weights for use. Extensive experiments and ablation studies on three standard long-tail benchmarks demonstrate the effectiveness of our approach, notably outperforming strong baseline methods that use the same amount of data. The code will be made publicly available.

Learning from Neighbors: Category Extrapolation for Long-Tail Learning

TL;DR

The paper tackles long-tail visual recognition by increasing data granularity through category extrapolation: introducing open-set auxiliary categories that are visually related to target classes. An automated pipeline uses LLMs to identify fine-grained neighbor categories and web crawling to collect images, followed by a neighbor-silencing loss to prevent auxiliary data from overshadowing the target task. At inference, auxiliary weights are masked, preserving only target-class predictions. Across ImageNet-LT, iNaturalist 2018, and Places-LT, the approach yields consistent improvements over BalCE baselines under multiple pretraining regimes, often achieving substantial gains on tail categories and offering robust generalization without additional classifier rebalancing.

Abstract

Balancing training on long-tail data distributions remains a long-standing challenge in deep learning. While methods such as re-weighting and re-sampling help alleviate the imbalance issue, limited sample diversity continues to hinder models from learning robust and generalizable feature representations, particularly for tail classes. In contrast to existing methods, we offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance. In this paper, we investigate this phenomenon through both quantitative and qualitative studies, showing that increased granularity enhances the generalization of learned features in tail categories. Motivated by these findings, we propose a method to increase dataset granularity through category extrapolation. Specifically, we introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes. This forms the core contribution and insight of our approach. To automate the curation of auxiliary data, we leverage large language models (LLMs) as knowledge bases to search for auxiliary categories and retrieve relevant images through web crawling. To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss that encourages the model to focus on class discrimination within the target dataset. During inference, the classifier weights for auxiliary categories are masked out, leaving only the target class weights for use. Extensive experiments and ablation studies on three standard long-tail benchmarks demonstrate the effectiveness of our approach, notably outperforming strong baseline methods that use the same amount of data. The code will be made publicly available.

Paper Structure

This paper contains 14 sections, 4 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Holistic comparison to previous philosophy. (a) Data imbalance between head and tail classes makes biased features; (b, c): Previous methods are still bounded by existing known classes; (d) We instead seek help from auxiliary open-set data.
  • Figure 2: Feature visualization of confusing head and tail classes by UMAP mcinnes2020umap on ImageNet-LT liu2019oltr. (a) Raw feature space of training data by DINOv2 oquab2023dinov2; (b) Feature space of training data after the training phase; (c) The baseline (re-weighting) shows poor generalization on validation data; (d) Adding auxiliary categories condenses clusters and improves separation.
  • Figure 3: Effect of granularity vs. imbalance ratio.
  • Figure 4: Data crawling pipeline. We prompt GPT-4 openai2023gpt4 for fine-grained categories related to query classes and retrieve corresponding images from the web. Classes already in the label set and images of lower visual similarity than the threshold are filtered out.
  • Figure 5: Ablation study on factors related to the curation of auxiliary dataset. Experiments are conducted on ImageNet-LT liu2019oltr. Default options are marked in red.
  • ...and 1 more figures