Table of Contents
Fetching ...

Like Humans to Few-Shot Learning through Knowledge Permeation of Vision and Text

Yuyu Jia, Qing Zhou, Wei Huang, Junyu Gao, Qi Wang

TL;DR

This work tackles few-shot learning by fusing general textual priors with image-specific visual cues to form a robust general–specific representation. It introduces Bidirectional Knowledge Permeation (BKP) to exchange information between meta-class prompts and visual features and Semantic Adversarial Disentanglement (SAD) to mitigate base-set bias that harms novel-class generalization. The approach delivers state-of-the-art results across four benchmarks (miniImageNet, tieredImageNet, CIFAR-FS, FC100), with notable gains in 1-shot settings and solid performance in 5-shot regimes, validated through extensive ablations and visual analyses. BiKop’s meta-class-specific prompts, cross-modal permeation, and adversarial disentanglement provide a practical, backbone-friendly framework for leveraging textual knowledge in few-shot recognition, potentially benefiting downstream visual-language tasks with limited labeled data.

Abstract

Few-shot learning aims to generalize the recognizer from seen categories to an entirely novel scenario. With only a few support samples, several advanced methods initially introduce class names as prior knowledge for identifying novel classes. However, obstacles still impede achieving a comprehensive understanding of how to harness the mutual advantages of visual and textual knowledge. In this paper, we propose a coherent Bidirectional Knowledge Permeation strategy called BiKop, which is grounded in a human intuition: A class name description offers a general representation, whereas an image captures the specificity of individuals. BiKop primarily establishes a hierarchical joint general-specific representation through bidirectional knowledge permeation. On the other hand, considering the bias of joint representation towards the base set, we disentangle base-class-relevant semantics during training, thereby alleviating the suppression of potential novel-class-relevant information. Experiments on four challenging benchmarks demonstrate the remarkable superiority of BiKop. Our code will be publicly available.

Like Humans to Few-Shot Learning through Knowledge Permeation of Vision and Text

TL;DR

This work tackles few-shot learning by fusing general textual priors with image-specific visual cues to form a robust general–specific representation. It introduces Bidirectional Knowledge Permeation (BKP) to exchange information between meta-class prompts and visual features and Semantic Adversarial Disentanglement (SAD) to mitigate base-set bias that harms novel-class generalization. The approach delivers state-of-the-art results across four benchmarks (miniImageNet, tieredImageNet, CIFAR-FS, FC100), with notable gains in 1-shot settings and solid performance in 5-shot regimes, validated through extensive ablations and visual analyses. BiKop’s meta-class-specific prompts, cross-modal permeation, and adversarial disentanglement provide a practical, backbone-friendly framework for leveraging textual knowledge in few-shot recognition, potentially benefiting downstream visual-language tasks with limited labeled data.

Abstract

Few-shot learning aims to generalize the recognizer from seen categories to an entirely novel scenario. With only a few support samples, several advanced methods initially introduce class names as prior knowledge for identifying novel classes. However, obstacles still impede achieving a comprehensive understanding of how to harness the mutual advantages of visual and textual knowledge. In this paper, we propose a coherent Bidirectional Knowledge Permeation strategy called BiKop, which is grounded in a human intuition: A class name description offers a general representation, whereas an image captures the specificity of individuals. BiKop primarily establishes a hierarchical joint general-specific representation through bidirectional knowledge permeation. On the other hand, considering the bias of joint representation towards the base set, we disentangle base-class-relevant semantics during training, thereby alleviating the suppression of potential novel-class-relevant information. Experiments on four challenging benchmarks demonstrate the remarkable superiority of BiKop. Our code will be publicly available.
Paper Structure (22 sections, 16 equations, 8 figures, 4 tables)

This paper contains 22 sections, 16 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Comparison of BiKop with studies of introducing textual knowledge. (a) They embed class names into text prototypes, directly employed to enhance classifiers. (b) Several recent methods utilize textual information to modulate the extraction of visual features. (c) Workflow of our BiKop. To alleviate the collapse of sparse feature representation, the BKP module harnesses the complementary advantages between textual and visual knowledge by the bidirectional permeation of both. Furthermore, the SAD module adversarially disentangles the base-class-relevant semantic to mitigate the base set bias and boost the model’s generalization to novel categories.
  • Figure 2: Configuration diagram of the Bidirectional Knowledge Permeation (BKP) module.
  • Figure 3: Visualization for the implementation of the Semantic Adversarial Disentanglement (SAD) module.
  • Figure 4: Effect of weight coefficients $\mu$ in the BKP moddule and $\gamma$ in the overall loss on mimiImageNet under $1$-shot setting.
  • Figure 5: Effect of sampling times $m$ and layer number of MLP block $\mathcal{D}(\cdot)$ in the SAD module on mimiImageNet under $1$-shot setting.
  • ...and 3 more figures