Table of Contents
Fetching ...

A New First-Order Meta-Learning Algorithm with Convergence Guarantees

El Mahdi Chayti, Martin Jaggi

TL;DR

It is shown that the MAML objective does not satisfy the smoothness assumption assumed in previous works; it is shown instead that its smoothness constant grows with the norm of the meta-gradient, which theoretically suggests the use of normalized or clipped-gradient methods compared to the plain gradient method used in previous works.

Abstract

Learning new tasks by drawing on prior experience gathered from other (related) tasks is a core property of any intelligent system. Gradient-based meta-learning, especially MAML and its variants, has emerged as a viable solution to accomplish this goal. One problem MAML encounters is its computational and memory burdens needed to compute the meta-gradients. We propose a new first-order variant of MAML that we prove converges to a stationary point of the MAML objective, unlike other first-order variants. We also show that the MAML objective does not satisfy the smoothness assumption assumed in previous works; we show instead that its smoothness constant grows with the norm of the meta-gradient, which theoretically suggests the use of normalized or clipped-gradient methods compared to the plain gradient method used in previous works. We validate our theory on a synthetic experiment.

A New First-Order Meta-Learning Algorithm with Convergence Guarantees

TL;DR

It is shown that the MAML objective does not satisfy the smoothness assumption assumed in previous works; it is shown instead that its smoothness constant grows with the norm of the meta-gradient, which theoretically suggests the use of normalized or clipped-gradient methods compared to the plain gradient method used in previous works.

Abstract

Learning new tasks by drawing on prior experience gathered from other (related) tasks is a core property of any intelligent system. Gradient-based meta-learning, especially MAML and its variants, has emerged as a viable solution to accomplish this goal. One problem MAML encounters is its computational and memory burdens needed to compute the meta-gradients. We propose a new first-order variant of MAML that we prove converges to a stationary point of the MAML objective, unlike other first-order variants. We also show that the MAML objective does not satisfy the smoothness assumption assumed in previous works; we show instead that its smoothness constant grows with the norm of the meta-gradient, which theoretically suggests the use of normalized or clipped-gradient methods compared to the plain gradient method used in previous works. We validate our theory on a synthetic experiment.
Paper Structure (19 sections, 68 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 19 sections, 68 equations, 1 figure, 1 table, 1 algorithm.

Figures (1)

  • Figure 1: (Left) Quality of the meta-gradient estimates of different algorithms as a function of the number of inner steps. (Right) Outer loss as a function of the outer iterations for different algorithms for a fixed number of inner steps equal to $20$. $kappa$ is $\hat{\kappa}$ the condition number of the problem; it has been chosen big to assimilate a difficult problem. $lbda$ is the regularization parameter $\lambda$, $n$ is the dimension of the problem, and $inner\_lr$ is the inner learning rate; we use GD as the outer optimization algorithm, and the outer learning rate was fine-tuned for the best performance for each algorithm in the right plot. FO-B-MAML outperforms other first-order methods and is competitive with second-order methods: MAML and iMAML; in fact, it outperforms iMAML for $cg\in\{2,5\}$, where $cg$ is the number of conjugate gradient steps.