Table of Contents
Fetching ...

Dual-View Data Hallucination with Semantic Relation Guidance for Few-Shot Image Recognition

Hefeng Wu, Guangzhi Ye, Ziyang Zhou, Ling Tian, Qing Wang, Liang Lin

TL;DR

The paper tackles the challenge of recognizing novel classes from few examples by generating augmented data guided by semantic relations. It introduces two hallucination streams—instance-view leveraging local semantic correlations and global fusion, and prototype-view estimating robust prototypes with semantic-aware resampling—to enrich the training data from base-class knowledge. By encoding semantic information from WordNet and DistilBERT and integrating Grad-CAM-based localization and Tukey transforms, the framework achieves stronger generalization and scalable data augmentation. Empirical results on miniImageNet, tieredImageNet, and CUB show state-of-the-art or competitive performance across tasks and demonstrate meaningful cross-domain transfer, with ablations validating the contribution of each component.

Abstract

Learning to recognize novel concepts from just a few image samples is very challenging as the learned model is easily overfitted on the few data and results in poor generalizability. One promising but underexplored solution is to compensate the novel classes by generating plausible samples. However, most existing works of this line exploit visual information only, rendering the generated data easy to be distracted by some challenging factors contained in the few available samples. Being aware of the semantic information in the textual modality that reflects human concepts, this work proposes a novel framework that exploits semantic relations to guide dual-view data hallucination for few-shot image recognition. The proposed framework enables generating more diverse and reasonable data samples for novel classes through effective information transfer from base classes. Specifically, an instance-view data hallucination module hallucinates each sample of a novel class to generate new data by employing local semantic correlated attention and global semantic feature fusion derived from base classes. Meanwhile, a prototype-view data hallucination module exploits semantic-aware measure to estimate the prototype of a novel class and the associated distribution from the few samples, which thereby harvests the prototype as a more stable sample and enables resampling a large number of samples. We conduct extensive experiments and comparisons with state-of-the-art methods on several popular few-shot benchmarks to verify the effectiveness of the proposed framework.

Dual-View Data Hallucination with Semantic Relation Guidance for Few-Shot Image Recognition

TL;DR

The paper tackles the challenge of recognizing novel classes from few examples by generating augmented data guided by semantic relations. It introduces two hallucination streams—instance-view leveraging local semantic correlations and global fusion, and prototype-view estimating robust prototypes with semantic-aware resampling—to enrich the training data from base-class knowledge. By encoding semantic information from WordNet and DistilBERT and integrating Grad-CAM-based localization and Tukey transforms, the framework achieves stronger generalization and scalable data augmentation. Empirical results on miniImageNet, tieredImageNet, and CUB show state-of-the-art or competitive performance across tasks and demonstrate meaningful cross-domain transfer, with ablations validating the contribution of each component.

Abstract

Learning to recognize novel concepts from just a few image samples is very challenging as the learned model is easily overfitted on the few data and results in poor generalizability. One promising but underexplored solution is to compensate the novel classes by generating plausible samples. However, most existing works of this line exploit visual information only, rendering the generated data easy to be distracted by some challenging factors contained in the few available samples. Being aware of the semantic information in the textual modality that reflects human concepts, this work proposes a novel framework that exploits semantic relations to guide dual-view data hallucination for few-shot image recognition. The proposed framework enables generating more diverse and reasonable data samples for novel classes through effective information transfer from base classes. Specifically, an instance-view data hallucination module hallucinates each sample of a novel class to generate new data by employing local semantic correlated attention and global semantic feature fusion derived from base classes. Meanwhile, a prototype-view data hallucination module exploits semantic-aware measure to estimate the prototype of a novel class and the associated distribution from the few samples, which thereby harvests the prototype as a more stable sample and enables resampling a large number of samples. We conduct extensive experiments and comparisons with state-of-the-art methods on several popular few-shot benchmarks to verify the effectiveness of the proposed framework.
Paper Structure (30 sections, 17 equations, 9 figures, 7 tables)

This paper contains 30 sections, 17 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: t-SNE visualization of image samples from several classes and their semantic distance with the "ferret" class. Each class is denoted with a different color and class "ferret" is assumed to have one-shot sample (support set). The "arctic fox" class is much closer to the "ferret" class in data distribution than the "stage" class, but it is not correctly reflected by the support sample of the "ferret" class in the visual space. In contrast, such relationships can be revealed from the semantic space. Best viewed in color.
  • Figure 2: Illustration of our data hallucination framework. It generates hallucinated data of novel classes from both instance and prototype views with semantic relation guidance to facilitate model training. The instance-view data hallucination module generates new samples from each instance sample by local semantic correlation and global semantic fusion. The prototype-view data hallucination module explores semantic-aware measure to estimate the prototype of a novel class and the associated distribution, which thereby harvests the prototype as a more stable sample and enables resampling a sufficient number of samples. Finally, the hallucinated data are combined with the original data to train the recognition model.
  • Figure 3: Image examples: (a) MSCOCO, (b) Textures, (c) Fungi, (d) Omniglot.
  • Figure 4: Effect analysis of $\lambda$ and $\tau$ on miniImageNet.
  • Figure 5: Effect analysis of the parameters $\alpha$, $p$, and $q$ in PVDH on miniImageNet.
  • ...and 4 more figures