Cooperative Meta-Learning with Gradient Augmentation

Jongyun Shin; Seunjin Han; Jangho Kim

Cooperative Meta-Learning with Gradient Augmentation

Jongyun Shin, Seunjin Han, Jangho Kim

TL;DR

This work proposes a novel cooperative meta-learning framework dubbed CML which leverages gradient-level regularization with gradient augmentation with gradient augmentation and injects learnable noise into the gradient of the model for the model generalization.

Abstract

Model agnostic meta-learning (MAML) is one of the most widely used gradient-based meta-learning, consisting of two optimization loops: an inner loop and outer loop. MAML learns the new task from meta-initialization parameters with an inner update and finds the meta-initialization parameters in the outer loop. In general, the injection of noise into the gradient of the model for augmenting the gradient is one of the widely used regularization methods. In this work, we propose a novel cooperative meta-learning framework dubbed CML which leverages gradient-level regularization with gradient augmentation. We inject learnable noise into the gradient of the model for the model generalization. The key idea of CML is introducing the co-learner which has no inner update but the outer loop update to augment gradients for finding better meta-initialization parameters. Since the co-learner does not update in the inner loop, it can be easily deleted after meta-training. Therefore, CML infers with only meta-learner without additional cost and performance degradation. We demonstrate that CML is easily applicable to gradient-based meta-learning methods and CML leads to increased performance in few-shot regression, few-shot image classification and few-shot node classification tasks. Our codes are at https://github.com/JJongyn/CML.

Cooperative Meta-Learning with Gradient Augmentation

TL;DR

Abstract

Paper Structure (23 sections, 1 theorem, 11 equations, 4 figures, 9 tables, 1 algorithm)

This paper contains 23 sections, 1 theorem, 11 equations, 4 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Gradient-based Meta-learning
Multi-branch framework
Regularization by noise
Methodology
Model-Agnostic Meta-Learning (MAML)
Cooperative Meta-Learning (CML)
Experiments
Few-shot regression
Few-shot image classification
Few-shot node classification
Gradient augmentation analysis
Efficiency analysis of the CML structure
Ablation study
...and 8 more sections

Key Result

Theorem 1

Let the meta-initialization parameters of the base network consisting of $N$ feature extraction layers and the meta-learner as $\omega = \{\psi^{\prime}_{1}, \cdots , \psi^{\prime}_{N}, \theta^{\prime}\}$. Consider the gradient $G^{(\mathcal{\psi^{\prime}}, \mathcal{\theta^{\prime}})} = \{g^{\mathca

Figures (4)

Figure 1: Overall process of CML and comparisons with other methods with a given task ($\mathcal{T}_i$). $\mathcal{\psi}, \mathcal{\theta}$ and $\phi$ denote meta-initialization parameters of the feature extractor, meta-learner and co-learner. The feature extractor $\psi$ extracts the features, i.e., body layers of DNN. The meta-learner $\mathcal{\theta}$ and co-learner $\phi$ predict outputs based on the features, i.e., classifier. $\mathcal{\psi}_{i}^{\prime}$, $\mathcal{\theta}_{i}^{\prime}$, and $\mathcal{\phi}_{i}^{\prime}$ means adapted parameters with $i$-task during an inner loop. Since CML does not adapt the co-learner to the task for generalization from gradient augmentation, after meta-training, CML can infer without additional costs. In meta-testing, CML evaluates performance after performing a task-adaptation, like standard MAML having $\mathcal{\psi}$ and $\mathcal{\theta}$. On the other hand, CML$^{\dagger}$ has parameters $\mathcal{\psi}$ and $\phi$, where only $\mathcal{\psi}$ performs the task-adaptation and then evaluates the performance.
Figure 2: Results of MAML and CML on 5,10 and 20-shot of simple regression task.
Figure 3: (a) Accuracy of MAML with random noise and CML. (b) Gradient similarity for the meta-learner and co-learner of the 4th convolution layer. (c) Comparison of gradient norm for the feature extractor in MAML, CL and CML after task-adaptation in the inner loop. At this point, we ignore the effect of bias, because of its negligible impact. (d) CKA Similarity results of representations before and after task-adaptation in the inner loop.
Figure 4: t-SNE of (a) MAML and (b) CML on trained miniimagenet. We perform the adaptation with the support set and then evaluate the method with the query set.

Theorems & Definitions (2)

Theorem 1
proof

Cooperative Meta-Learning with Gradient Augmentation

TL;DR

Abstract

Cooperative Meta-Learning with Gradient Augmentation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (2)