Table of Contents
Fetching ...

L2AE-D: Learning to Aggregate Embeddings for Few-shot Learning with Meta-level Dropout

Heda Song, Mercedes Torres Torres, Ender Özcan, Isaac Triguero

TL;DR

The paper tackles few-shot learning by improving how class representations are formed from limited examples. It introduces L2AE-D, which leverages a channel-wise attention module to aggregate per-channel feature maps across support examples and a meta-level dropout to mitigate meta-overfitting, all within an end-to-end trainable framework. The method yields state-of-the-art performance on Omniglot and competitive results on miniImageNet, while the meta-level dropout also boosts several baseline meta-learning approaches. Overall, L2AE-D provides a robust, tunable mechanism to emphasize useful features and suppress noise under data scarcity, enhancing generalisation in few-shot classification tasks.

Abstract

Few-shot learning focuses on learning a new visual concept with very limited labelled examples. A successful approach to tackle this problem is to compare the similarity between examples in a learned metric space based on convolutional neural networks. However, existing methods typically suffer from meta-level overfitting due to the limited amount of training tasks and do not normally consider the importance of the convolutional features of different examples within the same channel. To address these limitations, we make the following two contributions: (a) We propose a novel meta-learning approach for aggregating useful convolutional features and suppressing noisy ones based on a channel-wise attention mechanism to improve class representations. The proposed model does not require fine-tuning and can be trained in an end-to-end manner. The main novelty lies in incorporating a shared weight generation module that learns to assign different weights to the feature maps of different examples within the same channel. (b) We also introduce a simple meta-level dropout technique that reduces meta-level overfitting in several few-shot learning approaches. In our experiments, we find that this simple technique significantly improves the performance of the proposed method as well as various state-of-the-art meta-learning algorithms. Applying our method to few-shot image recognition using Omniglot and miniImageNet datasets shows that it is capable of delivering a state-of-the-art classification performance.

L2AE-D: Learning to Aggregate Embeddings for Few-shot Learning with Meta-level Dropout

TL;DR

The paper tackles few-shot learning by improving how class representations are formed from limited examples. It introduces L2AE-D, which leverages a channel-wise attention module to aggregate per-channel feature maps across support examples and a meta-level dropout to mitigate meta-overfitting, all within an end-to-end trainable framework. The method yields state-of-the-art performance on Omniglot and competitive results on miniImageNet, while the meta-level dropout also boosts several baseline meta-learning approaches. Overall, L2AE-D provides a robust, tunable mechanism to emphasize useful features and suppress noise under data scarcity, enhancing generalisation in few-shot classification tasks.

Abstract

Few-shot learning focuses on learning a new visual concept with very limited labelled examples. A successful approach to tackle this problem is to compare the similarity between examples in a learned metric space based on convolutional neural networks. However, existing methods typically suffer from meta-level overfitting due to the limited amount of training tasks and do not normally consider the importance of the convolutional features of different examples within the same channel. To address these limitations, we make the following two contributions: (a) We propose a novel meta-learning approach for aggregating useful convolutional features and suppressing noisy ones based on a channel-wise attention mechanism to improve class representations. The proposed model does not require fine-tuning and can be trained in an end-to-end manner. The main novelty lies in incorporating a shared weight generation module that learns to assign different weights to the feature maps of different examples within the same channel. (b) We also introduce a simple meta-level dropout technique that reduces meta-level overfitting in several few-shot learning approaches. In our experiments, we find that this simple technique significantly improves the performance of the proposed method as well as various state-of-the-art meta-learning algorithms. Applying our method to few-shot image recognition using Omniglot and miniImageNet datasets shows that it is capable of delivering a state-of-the-art classification performance.

Paper Structure

This paper contains 17 sections, 2 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Illustration of our motivation. Each embedding (rounded rectangle) consists of three feature maps (coloured squares), with outliers shown in dashed borders. (a) Binary classification with five training examples per class. We show the real class centres in the embedding space (solid circles) and the mean of each class' embeddings (hollow circle). (b) 4-class classification with one training example per class. Dashed arrows link similar feature maps in the embeddings from different classes.
  • Figure 2: 5-way 1-shot classification with L2AE-D: (1) Training samples are transformed by $f_{\varphi }$ into embeddings (set of feature maps shown in coloured squares); (2) To strengthen the first feature map for the first class, we put it in the first channel and the other feature maps in the others, then we feed the concatenated 5-channel feature maps into $g_{\phi }$ to generate aggregation weights; (3) The 5 feature maps are aggregated based on the generated weights; (4) To make predictions, we feed a query into $f_{\varphi }$, then compare its embedding with the aggregated training embeddings in the distance module. This outputs a one-hot vector representing the predicted label of the query.
  • Figure 3: C-way 5-shot classification with our approach. L2AE-D aggregates embeddings for each class: (1) The training examples are transformed by $f_{\varphi }$ into embeddings represented by a set of feature maps; (2) For each channel, we collect the feature maps and feed them into the attention module; (3) The feature maps are concatenated in depth and fed into $g_{\phi }$ to generate aggregation weights; (4) The feature maps are then aggregated based on the generated weights to represent a feature for this class.
  • Figure 4: The architecture of the attention module.
  • Figure 5: t-SNE visualisation of the aggregated embeddings of unseen classes for a 5-way 1-shot classification task on Omniglot (a) and a 5-way 5-shot task on miniImagenet (b). The embeddings of training samples are shown as points. Aggregated embeddings are shown as triangles. The embeddings of regular examples are shown as crosses. The Means of training embeddings are shown as diamonds.