Simple Semantic-Aided Few-Shot Learning
Hai Zhang, Junzhe Xu, Shanlin Jiang, Zhenan He
TL;DR
This work tackles the fragility of using naive semantics in Few-Shot Learning by proposing Semantic Evolution to automatically generate high-quality, paraphrased semantics from class names and a lightweight Semantic Alignment Network (SemAlign) to fuse visual features with semantics to reconstruct robust class prototypes. The approach achieves state-of-the-art results across six benchmarks, including cross-domain tasks, and ablation analyses show the critical role of semantic quality and the benefit of multimodal fusion over pure visual or semantic signals. The findings suggest that well-crafted semantic representations can enable simpler, more robust models for few-shot classification, with practical impact on setups with limited labeled data and diverse domains. Code availability further facilitates adoption and replication in the community.
Abstract
Learning from a limited amount of data, namely Few-Shot Learning, stands out as a challenging computer vision task. Several works exploit semantics and design complicated semantic fusion mechanisms to compensate for rare representative features within restricted data. However, relying on naive semantics such as class names introduces biases due to their brevity, while acquiring extensive semantics from external knowledge takes a huge time and effort. This limitation severely constrains the potential of semantics in Few-Shot Learning. In this paper, we design an automatic way called Semantic Evolution to generate high-quality semantics. The incorporation of high-quality semantics alleviates the need for complex network structures and learning algorithms used in previous works. Hence, we employ a simple two-layer network termed Semantic Alignment Network to transform semantics and visual features into robust class prototypes with rich discriminative features for few-shot classification. The experimental results show our framework outperforms all previous methods on six benchmarks, demonstrating a simple network with high-quality semantics can beat intricate multi-modal modules on few-shot classification tasks. Code is available at https://github.com/zhangdoudou123/SemFew.
