Table of Contents
Fetching ...

Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels

Xin-yang Zhao, Jian Jin, Yang-yang Li, Yazhou Yao

TL;DR

This work tackles Coarse-to-Fine Few-Shot (C2FS) learning, where coarse supervision can suppress fine-grained cues and scarce fine-grained samples induce biased classifier distributions. It introduces Twofold Debiasing (TFB), integrating embedding learning debias (via multi-layer feature fusion for reconstruction and intermediate-layer feature alignment) with fine-grained classifier debias (prototype calibration using base-class relationships) to coherently improve both representation and decision boundaries. Across BREEDS and CIFAR-100, TFB achieves state-of-the-art results, notably a substantial gain on CIFAR-100, demonstrating the value of optimizing both the feature extractor and the classifier for C2FS. These results suggest that leveraging coarse-label information for distribution calibration, alongside richer multi-layer representations, yields robust fine-grained recognition under limited supervision and has practical impact for scalable, real-world deployment.

Abstract

The Coarse-to-Fine Few-Shot (C2FS) task is designed to train models using only coarse labels, then leverages a limited number of subclass samples to achieve fine-grained recognition capabilities. This task presents two main challenges: coarse-grained supervised pre-training suppresses the extraction of critical fine-grained features for subcategory discrimination, and models suffer from overfitting due to biased distributions caused by limited fine-grained samples. In this paper, we propose the Twofold Debiasing (TFB) method, which addresses these challenges through detailed feature enhancement and distribution calibration. Specifically, we introduce a multi-layer feature fusion reconstruction module and an intermediate layer feature alignment module to combat the model's tendency to focus on simple predictive features directly related to coarse-grained supervision, while neglecting complex fine-grained level details. Furthermore, we mitigate the biased distributions learned by the fine-grained classifier using readily available coarse-grained sample embeddings enriched with fine-grained information. Extensive experiments conducted on five benchmark datasets demonstrate the efficacy of our approach, achieving state-of-the-art results that surpass competitive methods.

Twofold Debiasing Enhances Fine-Grained Learning with Coarse Labels

TL;DR

This work tackles Coarse-to-Fine Few-Shot (C2FS) learning, where coarse supervision can suppress fine-grained cues and scarce fine-grained samples induce biased classifier distributions. It introduces Twofold Debiasing (TFB), integrating embedding learning debias (via multi-layer feature fusion for reconstruction and intermediate-layer feature alignment) with fine-grained classifier debias (prototype calibration using base-class relationships) to coherently improve both representation and decision boundaries. Across BREEDS and CIFAR-100, TFB achieves state-of-the-art results, notably a substantial gain on CIFAR-100, demonstrating the value of optimizing both the feature extractor and the classifier for C2FS. These results suggest that leveraging coarse-label information for distribution calibration, alongside richer multi-layer representations, yields robust fine-grained recognition under limited supervision and has practical impact for scalable, real-world deployment.

Abstract

The Coarse-to-Fine Few-Shot (C2FS) task is designed to train models using only coarse labels, then leverages a limited number of subclass samples to achieve fine-grained recognition capabilities. This task presents two main challenges: coarse-grained supervised pre-training suppresses the extraction of critical fine-grained features for subcategory discrimination, and models suffer from overfitting due to biased distributions caused by limited fine-grained samples. In this paper, we propose the Twofold Debiasing (TFB) method, which addresses these challenges through detailed feature enhancement and distribution calibration. Specifically, we introduce a multi-layer feature fusion reconstruction module and an intermediate layer feature alignment module to combat the model's tendency to focus on simple predictive features directly related to coarse-grained supervision, while neglecting complex fine-grained level details. Furthermore, we mitigate the biased distributions learned by the fine-grained classifier using readily available coarse-grained sample embeddings enriched with fine-grained information. Extensive experiments conducted on five benchmark datasets demonstrate the efficacy of our approach, achieving state-of-the-art results that surpass competitive methods.

Paper Structure

This paper contains 19 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The Coarse-to-Fine Few-Shot (C2FS) task involves training a feature extractor using a training set with coarse labels, then the classifier leverages a support set with fine labels from a limited number of samples to achieve fine-grained recognition capabilities during testing.
  • Figure 2: We froze the model trained under coarse-grained supervision and added a classification model to the output of each different layer. Then, these classification models were trained under fine-grained supervision to test their capability for fine-grained classification based on these different layer outputs.
  • Figure 3: Overview of the TFB model training framework. The training objective consists of four components: coarse-grained cross-entropy loss $\mathcal{L}_{CE}$ for classifying coarse categories, reconstruction loss $\mathcal{L}_{rec}$ and alignment loss $\mathcal{L}_{align}$ to enhance the fine-grained representation capabilities of embeddings, and contrastive learning loss $\mathcal{L}_{cont}$ to learn better feature distributions within coarse category spaces. During testing, only the backbone is used as the feature extractor.
  • Figure 4: Multi-layer feature fusion for reconstruction branch and Intermediate layer feature alignment branch. The proposed fusion module employs cascaded concatenation operations and convolution layers to facilitate adaptive feature aggregation across different scales. $\mathbf{z}^{(i)}$ represents the rescaled $\mathbf{f}^{(i)}$.
  • Figure 5: Training a classifier with limited fine-grained samples leads to overfitting (left). Correcting the distribution of few-shot samples in the feature space using training set features (middle). The classifier trained from the calibrated distribution features has better generalization capabilities (right).
  • ...and 1 more figures