Unleash the Power of Local Representations for Few-Shot Classification
Shi Tang, Guiming Luo, Xinchen Ye, Zhiyi Xia
TL;DR
This work tackles the challenge of generalizing to novel classes in few-shot classification by leveraging local representations. It introduces FCAM, combining Feature Calibration with soft-label supervision and UniCon KL-Divergence, and an Adaptive Metric based on entropy-regularized optimal transport with a Modulate Module to adapt to different local feature-set compositions. The method achieves state-of-the-art results on miniImageNet, tieredImageNet, and CUB, including cross-domain and fine-grained scenarios, demonstrating the effectiveness of soft-label pretraining and adaptive, transport-based matching for few-shot generalization. The approach offers practical gains in robustness and transferability, while also highlighting a trade-off with computational cost tied to the number of patches used.
Abstract
Generalizing to novel classes unseen during training is a key challenge of few-shot classification. Recent metric-based methods try to address this by local representations. However, they are unable to take full advantage of them due to (i) improper supervision for pretraining the feature extractor, and (ii) lack of adaptability in the metric for handling various possible compositions of local feature sets. In this work, we unleash the power of local representations in improving novel-class generalization. For the feature extractor, we design a novel pretraining paradigm that learns randomly cropped patches by soft labels. It utilizes the class-level diversity of patches while diminishing the impact of their semantic misalignments to hard labels. To align network output with soft labels, we also propose a UniCon KL-Divergence that emphasizes the equal contribution of each base class in describing "non-base" patches. For the metric, we formulate measuring local feature sets as an entropy-regularized optimal transport problem to introduce the ability to handle sets consisting of homogeneous elements. Furthermore, we design a Modulate Module to endow the metric with the necessary adaptability. Our method achieves new state-of-the-art performance on three popular benchmarks. Moreover, it exceeds state-of-the-art transductive and cross-modal methods in the fine-grained scenario.
