Table of Contents
Fetching ...

Learn To Learn More Precisely

Runxi Cheng, Yongxian Wei, Xianglong He, Wanyun Zhu, Songsong Huang, Fei Richard Yu, Fei Ma, Chun Yuan

TL;DR

The paper addresses the tendency of meta-learning models, including MAML, to latch onto shortcut features in limited-shot settings. It introduces Meta Self-Distillation (MSD), a framework that updates models on different augmented views in the inner loop and enforces cross-view knowledge consistency on the same query in the outer loop, using a cosine-similarity based knowledge-consistency loss. By formalizing knowledge and its target vs. noise components and optimizing for precise target knowledge, MSD yields higher accuracy and more consistent knowledge across standard and augmented few-shot tasks on MiniImageNet and Tiered-ImageNet, with notable gains on larger backbones. The work suggests a pathway to more robust, precise learning in meta-learning and points to future extensions in self-supervised regimes and scaling to larger models.

Abstract

Meta-learning has been extensively applied in the domains of few-shot learning and fast adaptation, achieving remarkable performance. While Meta-learning methods like Model-Agnostic Meta-Learning (MAML) and its variants provide a good set of initial parameters for the model, the model still tends to learn shortcut features, which leads to poor generalization. In this paper, we propose the formal conception of "learn to learn more precisely", which aims to make the model learn precise target knowledge from data and reduce the effect of noisy knowledge, such as background and noise. To achieve this target, we proposed a simple and effective meta-learning framework named Meta Self-Distillation(MSD) to maximize the consistency of learned knowledge, enhancing the models' ability to learn precise target knowledge. In the inner loop, MSD uses different augmented views of the same support data to update the model respectively. Then in the outer loop, MSD utilizes the same query data to optimize the consistency of learned knowledge, enhancing the model's ability to learn more precisely. Our experiment demonstrates that MSD exhibits remarkable performance in few-shot classification tasks in both standard and augmented scenarios, effectively boosting the accuracy and consistency of knowledge learned by the model.

Learn To Learn More Precisely

TL;DR

The paper addresses the tendency of meta-learning models, including MAML, to latch onto shortcut features in limited-shot settings. It introduces Meta Self-Distillation (MSD), a framework that updates models on different augmented views in the inner loop and enforces cross-view knowledge consistency on the same query in the outer loop, using a cosine-similarity based knowledge-consistency loss. By formalizing knowledge and its target vs. noise components and optimizing for precise target knowledge, MSD yields higher accuracy and more consistent knowledge across standard and augmented few-shot tasks on MiniImageNet and Tiered-ImageNet, with notable gains on larger backbones. The work suggests a pathway to more robust, precise learning in meta-learning and points to future extensions in self-supervised regimes and scaling to larger models.

Abstract

Meta-learning has been extensively applied in the domains of few-shot learning and fast adaptation, achieving remarkable performance. While Meta-learning methods like Model-Agnostic Meta-Learning (MAML) and its variants provide a good set of initial parameters for the model, the model still tends to learn shortcut features, which leads to poor generalization. In this paper, we propose the formal conception of "learn to learn more precisely", which aims to make the model learn precise target knowledge from data and reduce the effect of noisy knowledge, such as background and noise. To achieve this target, we proposed a simple and effective meta-learning framework named Meta Self-Distillation(MSD) to maximize the consistency of learned knowledge, enhancing the models' ability to learn precise target knowledge. In the inner loop, MSD uses different augmented views of the same support data to update the model respectively. Then in the outer loop, MSD utilizes the same query data to optimize the consistency of learned knowledge, enhancing the model's ability to learn more precisely. Our experiment demonstrates that MSD exhibits remarkable performance in few-shot classification tasks in both standard and augmented scenarios, effectively boosting the accuracy and consistency of knowledge learned by the model.
Paper Structure (22 sections, 14 equations, 4 figures, 8 tables)

This paper contains 22 sections, 14 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: The core idea between Self-Distillation and Meta Self-Distillation. Self-Distillation aims to make the deep representation of different views closer, while Meta Self-Distillation aims to learn the same knowledge from the different views of the same image.
  • Figure 2: An overview of the proposed MSD. In the inner loop, MSD first uses different augmented support data to update the $f_{\theta}$. In the outer loop, then maximizes the consistency among the outputs of the same query data with different update versions of the initial model
  • Figure 3: The 5way1shot and 5way5shot classification accuracy and the consistency of learned knowledge with different numbers of inner steps with 95% confidence interval, averaged over 2000 tasks
  • Figure 4: The results of the visual analysis on the test set of MiniImageNet with MAML and MSD.