Table of Contents
Fetching ...

HyperMAML: Few-Shot Adaptation of Deep Models with Hypernetworks

M. Przewięźlikowski, P. Przybysz, J. Tabor, M. Zięba, P. Spurek

TL;DR

HyperMAML replaces the gradient-based inner-loop updates of Model-Agnostic Meta-Learning (MAML) with a trainable Hypernetwork that, given support-set embeddings and base-model predictions, produces a task-specific update Δθ so that the updated parameters are θ' = θ + Δθ. This avoids inner-loop backpropagation and second-order optimization, enabling faster, more biologically plausible adaptation while maintaining competitive accuracy. Empirically, HyperMAML outperforms classical MAML on several standard Few-Shot benchmarks and remains competitive with state-of-the-art methods, with notable improvements in computational efficiency. The approach also extends to cross-domain Few-Shot learning and offers a flexible pathway for rapid task transfer with a smaller hyperparameter footprint.

Abstract

The aim of Few-Shot learning methods is to train models which can easily adapt to previously unseen tasks, based on small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this method is to learn the general weights of the meta-model, which are further adapted to specific problems in a small number of gradient steps. However, the model's main limitation lies in the fact that the update procedure is realized by gradient-based optimisation. In consequence, MAML cannot always modify weights to the essential level in one or even a few gradient iterations. On the other hand, using many gradient steps results in a complex and time-consuming optimization procedure, which is hard to train in practice, and may lead to overfitting. In this paper, we propose HyperMAML, a novel generalization of MAML, where the training of the update procedure is also part of the model. Namely, in HyperMAML, instead of updating the weights with gradient descent, we use for this purpose a trainable Hypernetwork. Consequently, in this framework, the model can generate significant updates whose range is not limited to a fixed number of gradient steps. Experiments show that HyperMAML consistently outperforms MAML and performs comparably to other state-of-the-art techniques in a number of standard Few-Shot learning benchmarks.

HyperMAML: Few-Shot Adaptation of Deep Models with Hypernetworks

TL;DR

HyperMAML replaces the gradient-based inner-loop updates of Model-Agnostic Meta-Learning (MAML) with a trainable Hypernetwork that, given support-set embeddings and base-model predictions, produces a task-specific update Δθ so that the updated parameters are θ' = θ + Δθ. This avoids inner-loop backpropagation and second-order optimization, enabling faster, more biologically plausible adaptation while maintaining competitive accuracy. Empirically, HyperMAML outperforms classical MAML on several standard Few-Shot benchmarks and remains competitive with state-of-the-art methods, with notable improvements in computational efficiency. The approach also extends to cross-domain Few-Shot learning and offers a flexible pathway for rapid task transfer with a smaller hyperparameter footprint.

Abstract

The aim of Few-Shot learning methods is to train models which can easily adapt to previously unseen tasks, based on small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this method is to learn the general weights of the meta-model, which are further adapted to specific problems in a small number of gradient steps. However, the model's main limitation lies in the fact that the update procedure is realized by gradient-based optimisation. In consequence, MAML cannot always modify weights to the essential level in one or even a few gradient iterations. On the other hand, using many gradient steps results in a complex and time-consuming optimization procedure, which is hard to train in practice, and may lead to overfitting. In this paper, we propose HyperMAML, a novel generalization of MAML, where the training of the update procedure is also part of the model. Namely, in HyperMAML, instead of updating the weights with gradient descent, we use for this purpose a trainable Hypernetwork. Consequently, in this framework, the model can generate significant updates whose range is not limited to a fixed number of gradient steps. Experiments show that HyperMAML consistently outperforms MAML and performs comparably to other state-of-the-art techniques in a number of standard Few-Shot learning benchmarks.
Paper Structure (33 sections, 7 equations, 3 figures, 8 tables, 2 algorithms)

This paper contains 33 sections, 7 equations, 3 figures, 8 tables, 2 algorithms.

Figures (3)

  • Figure 1: We consider a two-dimensional dataset consisting of four Gaussian data (the first column). In the Meta-Learning scenario, we produce a task that consists of samples from two horizontal or vertical ellipses with permuted labels (second to the fifth column). The MAML model cannot update its parameters for all four tasks with a single gradient step in the inner loop (see the first row). To reach a reasonable solution, it requires up to five gradient updates (see the second row). Our HyperMAML can solve the task by using the Hypernetwork paradigm. Our method learns to perform only a single update, which nevertheless yields parameters optimal for given tasks.
  • Figure 2: The overview of HyperMAML architecture. The input support examples are processed by encoding network $E(\cdot)$ and delivered to the hypernetwork $H(\cdot)$ together with the true support labels and predictions from general model $f_{\theta}(\cdot)$. The hypernetwork transforms them, and returns the update of weigths $\Delta \theta$ for target classifier $f_{\theta'}$. The query example is transformed by Encoder $E(\cdot)$, and the final class distribution is returned by the target model $f_{\theta'}$ dedicated to the considered task.
  • Figure 3: Illustration of the embedding enhancement mechanism. The support embeddings (which serve as the input to the HyperMAML Hypernetwork) are enhanced with the predictions of the base classifier and their respective ground-truth labels.