Table of Contents
Fetching ...

Unleash the Power of Local Representations for Few-Shot Classification

Shi Tang, Guiming Luo, Xinchen Ye, Zhiyi Xia

TL;DR

This work tackles the challenge of generalizing to novel classes in few-shot classification by leveraging local representations. It introduces FCAM, combining Feature Calibration with soft-label supervision and UniCon KL-Divergence, and an Adaptive Metric based on entropy-regularized optimal transport with a Modulate Module to adapt to different local feature-set compositions. The method achieves state-of-the-art results on miniImageNet, tieredImageNet, and CUB, including cross-domain and fine-grained scenarios, demonstrating the effectiveness of soft-label pretraining and adaptive, transport-based matching for few-shot generalization. The approach offers practical gains in robustness and transferability, while also highlighting a trade-off with computational cost tied to the number of patches used.

Abstract

Generalizing to novel classes unseen during training is a key challenge of few-shot classification. Recent metric-based methods try to address this by local representations. However, they are unable to take full advantage of them due to (i) improper supervision for pretraining the feature extractor, and (ii) lack of adaptability in the metric for handling various possible compositions of local feature sets. In this work, we unleash the power of local representations in improving novel-class generalization. For the feature extractor, we design a novel pretraining paradigm that learns randomly cropped patches by soft labels. It utilizes the class-level diversity of patches while diminishing the impact of their semantic misalignments to hard labels. To align network output with soft labels, we also propose a UniCon KL-Divergence that emphasizes the equal contribution of each base class in describing "non-base" patches. For the metric, we formulate measuring local feature sets as an entropy-regularized optimal transport problem to introduce the ability to handle sets consisting of homogeneous elements. Furthermore, we design a Modulate Module to endow the metric with the necessary adaptability. Our method achieves new state-of-the-art performance on three popular benchmarks. Moreover, it exceeds state-of-the-art transductive and cross-modal methods in the fine-grained scenario.

Unleash the Power of Local Representations for Few-Shot Classification

TL;DR

This work tackles the challenge of generalizing to novel classes in few-shot classification by leveraging local representations. It introduces FCAM, combining Feature Calibration with soft-label supervision and UniCon KL-Divergence, and an Adaptive Metric based on entropy-regularized optimal transport with a Modulate Module to adapt to different local feature-set compositions. The method achieves state-of-the-art results on miniImageNet, tieredImageNet, and CUB, including cross-domain and fine-grained scenarios, demonstrating the effectiveness of soft-label pretraining and adaptive, transport-based matching for few-shot generalization. The approach offers practical gains in robustness and transferability, while also highlighting a trade-off with computational cost tied to the number of patches used.

Abstract

Generalizing to novel classes unseen during training is a key challenge of few-shot classification. Recent metric-based methods try to address this by local representations. However, they are unable to take full advantage of them due to (i) improper supervision for pretraining the feature extractor, and (ii) lack of adaptability in the metric for handling various possible compositions of local feature sets. In this work, we unleash the power of local representations in improving novel-class generalization. For the feature extractor, we design a novel pretraining paradigm that learns randomly cropped patches by soft labels. It utilizes the class-level diversity of patches while diminishing the impact of their semantic misalignments to hard labels. To align network output with soft labels, we also propose a UniCon KL-Divergence that emphasizes the equal contribution of each base class in describing "non-base" patches. For the metric, we formulate measuring local feature sets as an entropy-regularized optimal transport problem to introduce the ability to handle sets consisting of homogeneous elements. Furthermore, we design a Modulate Module to endow the metric with the necessary adaptability. Our method achieves new state-of-the-art performance on three popular benchmarks. Moreover, it exceeds state-of-the-art transductive and cross-modal methods in the fine-grained scenario.
Paper Structure (20 sections, 1 theorem, 20 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 20 sections, 1 theorem, 20 equations, 6 figures, 8 tables, 1 algorithm.

Key Result

Theorem 1

Marking the variables calculated using the teacher output $\mathbf{z}^\mathcal{T}$ and student output $\mathbf{z}^\mathcal{S}$ with the superscripts $\mathcal{T}$ and $\mathcal{S}$, respectively, the classical KL-Divergence for soft label supervision can be reformulated as:

Figures (6)

  • Figure 1: (a) Hard labels could provide false supervision since random cropping may alter the semantics. Describing patches by analogy, soft labels can avoid this and utilize the class-level diversity provided by random cropping. The matching flows between two sets of similar local patches using (b) EMD and (c) our Adaptive Metric.
  • Figure 2: Overview of our framework ($3$-way $2$-shot as an example).
  • Figure 3: Illustration of (a) the continuous binary classification process corresponding to the reformulation of KL-Divergence, and (b) the proposed Adaptive Metric formulating the measure process as an OT problem. To handle various set compositions, the adjustment coefficient of an entropy regularization is tuned by a Modulate Module.
  • Figure 4: Gaussian smoothed $1$-shot test accuracy curves on CUB-200-2011 during feature calibration, with different temperatures to adjust the weighting scheme of the classical KL-Divergence. The results of the same $1000$ tasks are averaged for each data point.
  • Figure 6: Visualization of solved transport matrices. Results of (a) EMD and (b) Adaptive Metric for sets consisting of similar local features, and the result of (c) Adaptive Metric for sets consisting of dissimilar local features.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Theorem 1