Table of Contents
Fetching ...

Boosting Meta-Training with Base Class Information for Few-Shot Learning

Weihao Jiang, Guodong Liu, Di He, Kun He

TL;DR

This work proposes an end-to-end training paradigm consisting of two alternative loops that not only converges quickly but also outperforms existing baselines, indicating that information from the overall training set and the meta-learning training paradigm could mutually reinforce one another.

Abstract

Few-shot learning, a challenging task in machine learning, aims to learn a classifier adaptable to recognize new, unseen classes with limited labeled examples. Meta-learning has emerged as a prominent framework for few-shot learning. Its training framework is originally a task-level learning method, such as Model-Agnostic Meta-Learning (MAML) and Prototypical Networks. And a recently proposed training paradigm called Meta-Baseline, which consists of sequential pre-training and meta-training stages, gains state-of-the-art performance. However, as a non-end-to-end training method, indicating the meta-training stage can only begin after the completion of pre-training, Meta-Baseline suffers from higher training cost and suboptimal performance due to the inherent conflicts of the two training stages. To address these limitations, we propose an end-to-end training paradigm consisting of two alternative loops. In the outer loop, we calculate cross entropy loss on the entire training set while updating only the final linear layer. In the inner loop, we employ the original meta-learning training mode to calculate the loss and incorporate gradients from the outer loss to guide the parameter updates. This training paradigm not only converges quickly but also outperforms existing baselines, indicating that information from the overall training set and the meta-learning training paradigm could mutually reinforce one another. Moreover, being model-agnostic, our framework achieves significant performance gains, surpassing the baseline systems by approximate 1%.

Boosting Meta-Training with Base Class Information for Few-Shot Learning

TL;DR

This work proposes an end-to-end training paradigm consisting of two alternative loops that not only converges quickly but also outperforms existing baselines, indicating that information from the overall training set and the meta-learning training paradigm could mutually reinforce one another.

Abstract

Few-shot learning, a challenging task in machine learning, aims to learn a classifier adaptable to recognize new, unseen classes with limited labeled examples. Meta-learning has emerged as a prominent framework for few-shot learning. Its training framework is originally a task-level learning method, such as Model-Agnostic Meta-Learning (MAML) and Prototypical Networks. And a recently proposed training paradigm called Meta-Baseline, which consists of sequential pre-training and meta-training stages, gains state-of-the-art performance. However, as a non-end-to-end training method, indicating the meta-training stage can only begin after the completion of pre-training, Meta-Baseline suffers from higher training cost and suboptimal performance due to the inherent conflicts of the two training stages. To address these limitations, we propose an end-to-end training paradigm consisting of two alternative loops. In the outer loop, we calculate cross entropy loss on the entire training set while updating only the final linear layer. In the inner loop, we employ the original meta-learning training mode to calculate the loss and incorporate gradients from the outer loss to guide the parameter updates. This training paradigm not only converges quickly but also outperforms existing baselines, indicating that information from the overall training set and the meta-learning training paradigm could mutually reinforce one another. Moreover, being model-agnostic, our framework achieves significant performance gains, surpassing the baseline systems by approximate 1%.
Paper Structure (22 sections, 6 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 22 sections, 6 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Accuracy of the second stage of Meta-Baseline on the validation dataset of $mini$ImageNet along with 5-way 5-shot training process. We also re-implement primary Protypical Networks ($ProtoNets$) under the same setting.
  • Figure 2: Few-shot as well as conventional classification accuracy at the second stage of Meta-Baseline along with the training process.
  • Figure 3: The main framework of our model. In the outer loop, we calculate the classification loss of one large batch from base classes and only update the linear classifier. In the inner loop, we use the meta-learning method to calculate the loss of tasks and update the model by inner loss and outer loss. The two loops are executed alternately, with $T$ inner loops per one outer loop.
  • Figure 4: The parameter update procedures of the Meta-Baseline and our model. In Meta-Baseline, meta-training continues on the model parameters that have been converged in pre-training, as shown in (a). In contrast, our model employs an end-to-end training process as illustrated in (b). In the outer loop, we compute the classification loss for a large batch from base classes. In the inner loop, we utilize the meta-learning method to calculate task losses and update the model based on both inner and outer losses.
  • Figure 5: Accuracy of the validation set during training. (a) Accuracy of the second stage of Meta-Baseline on the validation set of $mini$ImageNet and $tiered$ImageNet along with the training process; and (b) accuracy of our Boost-MT method on the validation set of $mini$ImageNet and $tiered$ImageNet along with the training process. 5-1 denotes the 5-way 1-shot problem and 5-5 denotes the 5-way 5-shot problem.
  • ...and 1 more figures