Like Humans to Few-Shot Learning through Knowledge Permeation of Vision and Text
Yuyu Jia, Qing Zhou, Wei Huang, Junyu Gao, Qi Wang
TL;DR
This work tackles few-shot learning by fusing general textual priors with image-specific visual cues to form a robust general–specific representation. It introduces Bidirectional Knowledge Permeation (BKP) to exchange information between meta-class prompts and visual features and Semantic Adversarial Disentanglement (SAD) to mitigate base-set bias that harms novel-class generalization. The approach delivers state-of-the-art results across four benchmarks (miniImageNet, tieredImageNet, CIFAR-FS, FC100), with notable gains in 1-shot settings and solid performance in 5-shot regimes, validated through extensive ablations and visual analyses. BiKop’s meta-class-specific prompts, cross-modal permeation, and adversarial disentanglement provide a practical, backbone-friendly framework for leveraging textual knowledge in few-shot recognition, potentially benefiting downstream visual-language tasks with limited labeled data.
Abstract
Few-shot learning aims to generalize the recognizer from seen categories to an entirely novel scenario. With only a few support samples, several advanced methods initially introduce class names as prior knowledge for identifying novel classes. However, obstacles still impede achieving a comprehensive understanding of how to harness the mutual advantages of visual and textual knowledge. In this paper, we propose a coherent Bidirectional Knowledge Permeation strategy called BiKop, which is grounded in a human intuition: A class name description offers a general representation, whereas an image captures the specificity of individuals. BiKop primarily establishes a hierarchical joint general-specific representation through bidirectional knowledge permeation. On the other hand, considering the bias of joint representation towards the base set, we disentangle base-class-relevant semantics during training, thereby alleviating the suppression of potential novel-class-relevant information. Experiments on four challenging benchmarks demonstrate the remarkable superiority of BiKop. Our code will be publicly available.
