Table of Contents
Fetching ...

A Feature Generator for Few-Shot Learning

Heethanjan Kanagalingam, Thenukan Pathmanathan, Navaneethan Ketheeswaran, Mokeeshan Vathanakumar, Mohamed Afham, Ranga Rodrigo

TL;DR

A feature generator that creates visual features from class-level textual descriptions by training the generator with a combination of classifier loss, discriminator loss, and distance loss between the generated features and true class embeddings is introduced.

Abstract

Few-shot learning (FSL) aims to enable models to recognize novel objects or classes with limited labelled data. Feature generators, which synthesize new data points to augment limited datasets, have emerged as a promising solution to this challenge. This paper investigates the effectiveness of feature generators in enhancing the embedding process for FSL tasks. To address the issue of inaccurate embeddings due to the scarcity of images per class, we introduce a feature generator that creates visual features from class-level textual descriptions. By training the generator with a combination of classifier loss, discriminator loss, and distance loss between the generated features and true class embeddings, we ensure the generation of accurate same-class features and enhance the overall feature representation. Our results show a significant improvement in accuracy over baseline methods, with our approach outperforming the baseline model by 10% in 1-shot and around 5% in 5-shot approaches. Additionally, both visual-only and visual + textual generators have also been tested in this paper. The code is publicly available at https://github.com/heethanjan/Feature-Generator-for-FSL.

A Feature Generator for Few-Shot Learning

TL;DR

A feature generator that creates visual features from class-level textual descriptions by training the generator with a combination of classifier loss, discriminator loss, and distance loss between the generated features and true class embeddings is introduced.

Abstract

Few-shot learning (FSL) aims to enable models to recognize novel objects or classes with limited labelled data. Feature generators, which synthesize new data points to augment limited datasets, have emerged as a promising solution to this challenge. This paper investigates the effectiveness of feature generators in enhancing the embedding process for FSL tasks. To address the issue of inaccurate embeddings due to the scarcity of images per class, we introduce a feature generator that creates visual features from class-level textual descriptions. By training the generator with a combination of classifier loss, discriminator loss, and distance loss between the generated features and true class embeddings, we ensure the generation of accurate same-class features and enhance the overall feature representation. Our results show a significant improvement in accuracy over baseline methods, with our approach outperforming the baseline model by 10% in 1-shot and around 5% in 5-shot approaches. Additionally, both visual-only and visual + textual generators have also been tested in this paper. The code is publicly available at https://github.com/heethanjan/Feature-Generator-for-FSL.
Paper Structure (23 sections, 11 equations, 4 figures, 4 tables)

This paper contains 23 sections, 11 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The feature generation process. To generate the best optimum visual features from the class-level semantic features (a), the generated features are added to the initial true features(b), and the mean feature is updated by considering all the features (c). Finally, the updated features mean, obtained by combining the generated and real features, converges closer to the true class embedding (d).
  • Figure 2: The overall architecture diagram. To train the generator, an image true feature (c) and its corresponding class-level textual description (b) are taken from the image dataset (a). The true class embedding (d) is calculated by taking the mean of all the image true features belonging to the selected true feature (c) class. The generator (e) generates visual features (f) from the semantic feature (g) extracted from the class description (h) using a text feature extractor (i). The classifier loss (j)is calculated using categorical cross-entropy loss and discriminator loss (k) is calculated using binary cross-entropy loss. In contrast, the cosine distance loss (l) is computed as the distance between the true class embedding (m) and the generated feature. During training, the generator aims to minimize the sum of these three losses (j, k, l). The inference support set (n) contains Images and corresponding class descriptions. Semantic features (g) and visual features (o) are extracted using a text feature extractor (p) and an image feature extractor (q), respectively. The synthetic visual features (f) are generated by inputting the semantic features (g) to the generator (r). The generated feature is multiplied by $\lambda$ and added to the new support set (s), which is subsequently used for the FSL classification task.
  • Figure 3: Classification accuracy for different semantic weight $\alpha$ in miniImageNet dataset and Meta-Baseline as the baseline
  • Figure 4: Visualization of the effect of using textual features for visual feature generation. Here the updated support class embeddings move closer to the true class embedding.