Table of Contents
Fetching ...

Simple Semantic-Aided Few-Shot Learning

Hai Zhang, Junzhe Xu, Shanlin Jiang, Zhenan He

TL;DR

This work tackles the fragility of using naive semantics in Few-Shot Learning by proposing Semantic Evolution to automatically generate high-quality, paraphrased semantics from class names and a lightweight Semantic Alignment Network (SemAlign) to fuse visual features with semantics to reconstruct robust class prototypes. The approach achieves state-of-the-art results across six benchmarks, including cross-domain tasks, and ablation analyses show the critical role of semantic quality and the benefit of multimodal fusion over pure visual or semantic signals. The findings suggest that well-crafted semantic representations can enable simpler, more robust models for few-shot classification, with practical impact on setups with limited labeled data and diverse domains. Code availability further facilitates adoption and replication in the community.

Abstract

Learning from a limited amount of data, namely Few-Shot Learning, stands out as a challenging computer vision task. Several works exploit semantics and design complicated semantic fusion mechanisms to compensate for rare representative features within restricted data. However, relying on naive semantics such as class names introduces biases due to their brevity, while acquiring extensive semantics from external knowledge takes a huge time and effort. This limitation severely constrains the potential of semantics in Few-Shot Learning. In this paper, we design an automatic way called Semantic Evolution to generate high-quality semantics. The incorporation of high-quality semantics alleviates the need for complex network structures and learning algorithms used in previous works. Hence, we employ a simple two-layer network termed Semantic Alignment Network to transform semantics and visual features into robust class prototypes with rich discriminative features for few-shot classification. The experimental results show our framework outperforms all previous methods on six benchmarks, demonstrating a simple network with high-quality semantics can beat intricate multi-modal modules on few-shot classification tasks. Code is available at https://github.com/zhangdoudou123/SemFew.

Simple Semantic-Aided Few-Shot Learning

TL;DR

This work tackles the fragility of using naive semantics in Few-Shot Learning by proposing Semantic Evolution to automatically generate high-quality, paraphrased semantics from class names and a lightweight Semantic Alignment Network (SemAlign) to fuse visual features with semantics to reconstruct robust class prototypes. The approach achieves state-of-the-art results across six benchmarks, including cross-domain tasks, and ablation analyses show the critical role of semantic quality and the benefit of multimodal fusion over pure visual or semantic signals. The findings suggest that well-crafted semantic representations can enable simpler, more robust models for few-shot classification, with practical impact on setups with limited labeled data and diverse domains. Code availability further facilitates adoption and replication in the community.

Abstract

Learning from a limited amount of data, namely Few-Shot Learning, stands out as a challenging computer vision task. Several works exploit semantics and design complicated semantic fusion mechanisms to compensate for rare representative features within restricted data. However, relying on naive semantics such as class names introduces biases due to their brevity, while acquiring extensive semantics from external knowledge takes a huge time and effort. This limitation severely constrains the potential of semantics in Few-Shot Learning. In this paper, we design an automatic way called Semantic Evolution to generate high-quality semantics. The incorporation of high-quality semantics alleviates the need for complex network structures and learning algorithms used in previous works. Hence, we employ a simple two-layer network termed Semantic Alignment Network to transform semantics and visual features into robust class prototypes with rich discriminative features for few-shot classification. The experimental results show our framework outperforms all previous methods on six benchmarks, demonstrating a simple network with high-quality semantics can beat intricate multi-modal modules on few-shot classification tasks. Code is available at https://github.com/zhangdoudou123/SemFew.
Paper Structure (19 sections, 6 equations, 6 figures, 6 tables)

This paper contains 19 sections, 6 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The introduction of how high-quality semantics reconstruct the class prototype through complementarity between modalities. Periphery image is an image with fewer discriminative features and prototype image represents images with concrete and enriched representative features.
  • Figure 2: The illustration of how Semantic Evolution converts the simple name and the definition into the high-quality description.
  • Figure 3: The framework of our proposed SemFew. During the training stage, images and paraphrased semantics are encoded and fed into SemAlign $h$, with the objective of reducing the distance between the output of $h$ and the class prototype in the visual space. During the testing stage, images in the support set are transformed into class prototypes by $h$, and query images are classified by identifying the nearest prototype. The symbol $\oplus$ denotes a concatenation operation.
  • Figure 4: Average results (%) on different semantics.
  • Figure 5: Visualization results on the MiniImageNet dataset. Different colors or shapes represent different classes. The $\star$ represents the class prototypes, and the $\diamondsuit$ denotes the prototypes reconstructed by our method.
  • ...and 1 more figures