Table of Contents
Fetching ...

Unsupervised Meta-Learning via In-Context Learning

Anna Vettoruzzo, Lorenzo Braccaioli, Joaquin Vanschoren, Marlena Nowaczyk

TL;DR

This paper proposes a novel approach to unsupervised meta-learning that leverages the generalization abilities of in-context learning observed in transformer architectures, and reframes meta-learning as a sequence modeling problem, enabling the transformer encoder to learn task context from support images and utilize it to predict query images.

Abstract

Unsupervised meta-learning aims to learn feature representations from unsupervised datasets that can transfer to downstream tasks with limited labeled data. In this paper, we propose a novel approach to unsupervised meta-learning that leverages the generalization abilities of in-context learning observed in transformer architectures. Our method reframes meta-learning as a sequence modeling problem, enabling the transformer encoder to learn task context from support images and utilize it to predict query images. At the core of our approach lies the creation of diverse tasks generated using a combination of data augmentations and a mixing strategy that challenges the model during training while fostering generalization to unseen tasks at test time. Experimental results on benchmark datasets showcase the superiority of our approach over existing unsupervised meta-learning baselines, establishing it as the new state-of-the-art. Remarkably, our method achieves competitive results with supervised and self-supervised approaches, underscoring its efficacy in leveraging generalization over memorization.

Unsupervised Meta-Learning via In-Context Learning

TL;DR

This paper proposes a novel approach to unsupervised meta-learning that leverages the generalization abilities of in-context learning observed in transformer architectures, and reframes meta-learning as a sequence modeling problem, enabling the transformer encoder to learn task context from support images and utilize it to predict query images.

Abstract

Unsupervised meta-learning aims to learn feature representations from unsupervised datasets that can transfer to downstream tasks with limited labeled data. In this paper, we propose a novel approach to unsupervised meta-learning that leverages the generalization abilities of in-context learning observed in transformer architectures. Our method reframes meta-learning as a sequence modeling problem, enabling the transformer encoder to learn task context from support images and utilize it to predict query images. At the core of our approach lies the creation of diverse tasks generated using a combination of data augmentations and a mixing strategy that challenges the model during training while fostering generalization to unseen tasks at test time. Experimental results on benchmark datasets showcase the superiority of our approach over existing unsupervised meta-learning baselines, establishing it as the new state-of-the-art. Remarkably, our method achieves competitive results with supervised and self-supervised approaches, underscoring its efficacy in leveraging generalization over memorization.
Paper Structure (32 sections, 8 equations, 9 figures, 14 tables)

This paper contains 32 sections, 8 equations, 9 figures, 14 tables.

Figures (9)

  • Figure 1: Visualization of CAMeLU (with 3-way 5-shot tasks). The left side illustrates the task creation mechanism, where $N$ samples are drawn from an unlabeled dataset $\mathcal{D}_{train}$. Each sample $x_n$ is augmented $K$ times to obtain $x_{n,k}^{(sp)}$. A strategy inspired by mixupmixup is utilized for generating the query set by using an augmented version of $x_n$, i.e., $\tilde{x}_{n,j}$. The same pseudo-label $n \in [1, N]$ is assigned to all data generated from the sample $x_n$. On the right side, the so-created task is fed into the transformer encoder for predicting the query input. Inspired by CAML caml, the transformer encoder processes demonstrations created by concatenating features from a fixed pre-trained feature extractor and a learned class encoder. The symbol $*$ denotes the unknown query label that the transformer encoder aims to predict.
  • Figure 2: Visualization of clustered embeddings obtained with CAMeLU after the feature extractor (left) and the transformer encoder (right) on a 5-way 5-shot task sampled from the CUB dataset. Crosses indicate the centroids of each class, and the numbers denote the Euclidean distances between the query (triangle) and each class centroid. The plots are obtained using t-SNE tsne with a perplexity equal to 9.
  • Figure 3: Analysis of learning behavior when transferring knowledge from a different prior dataset. The relative validation accuracy shows the difference between the current and first epoch accuracy on the validation set of miniImageNet, CIFAR-fs, CUB, and Aircraft. CAMeLU is trained with ImageNet-964.
  • Figure 4: Relative validation accuracy of CAMeLU (orange) and CAML (blue) when evaluated in-domain on miniImagenet and computed as in Sect. \ref{['subsec:gen_vs_mem']}. The curve obtained with CAMeLU reflects the three phases of memorization, learning, and generalization even when using a small-scale dataset.
  • Figure 5: Visualization of clustered embeddings obtained with CAMeLU after the fixed feature extractor (left) and the transformer encoder (right) across different datasets. The plots represent 5-way 5-shot tasks during inference. Crosses indicate the centroids for each class, triangles represent the query sample embeddings, and the numbers denote the Euclidean distances between the query and each class centroid. The plots are obtained using t-SNE tsne with a perplexity equals to 9.
  • ...and 4 more figures