Table of Contents
Fetching ...

Episodic fine-tuning prototypical networks for optimization-based few-shot learning: Application to audio classification

Xuanyu Zhuang, Geoffroy Peeters, Gaël Richard

TL;DR

The paper tackles few-shot audio classification by enhancing Prototypical Networks through Rotational Division Fine-Tuning (RDFT), which uses labeled support data to fine-tune the ProtoNet in a test episode without leveraging the query set. It further embeds ProtoNet into optimization-based meta-learners, yielding MAML-Proto and MC-Proto via an episodic fine-tuning strategy that applies RDFT within inner updates. Empirical results on ESC-50 and Speech Commands v2 show that RDFT alone can degrade ProtoNet, but when integrated with MAML/Meta-Curvature, the proposed models achieve substantial gains over a regular ProtoNet, with MC-Proto delivering the strongest accuracy among the tested configurations (though Proto-HA remains SOTA on ESC-50). The approach is presented as a general framework with potential applicability beyond audio to other modalities, and future work includes extending RDFT to additional metric-based FSL methods and providing theoretical insights.

Abstract

The Prototypical Network (ProtoNet) has emerged as a popular choice in Few-shot Learning (FSL) scenarios due to its remarkable performance and straightforward implementation. Building upon such success, we first propose a simple (yet novel) method to fine-tune a ProtoNet on the (labeled) support set of the test episode of a C-way-K-shot test episode (without using the query set which is only used for evaluation). We then propose an algorithmic framework that combines ProtoNet with optimization-based FSL algorithms (MAML and Meta-Curvature) to work with such a fine-tuning method. Since optimization-based algorithms endow the target learner model with the ability to fast adaption to only a few samples, we utilize ProtoNet as the target model to enhance its fine-tuning performance with the help of a specifically designed episodic fine-tuning strategy. The experimental results confirm that our proposed models, MAML-Proto and MC-Proto, combined with our unique fine-tuning method, outperform regular ProtoNet by a large margin in few-shot audio classification tasks on the ESC-50 and Speech Commands v2 datasets. We note that although we have only applied our model to the audio domain, it is a general method and can be easily extended to other domains.

Episodic fine-tuning prototypical networks for optimization-based few-shot learning: Application to audio classification

TL;DR

The paper tackles few-shot audio classification by enhancing Prototypical Networks through Rotational Division Fine-Tuning (RDFT), which uses labeled support data to fine-tune the ProtoNet in a test episode without leveraging the query set. It further embeds ProtoNet into optimization-based meta-learners, yielding MAML-Proto and MC-Proto via an episodic fine-tuning strategy that applies RDFT within inner updates. Empirical results on ESC-50 and Speech Commands v2 show that RDFT alone can degrade ProtoNet, but when integrated with MAML/Meta-Curvature, the proposed models achieve substantial gains over a regular ProtoNet, with MC-Proto delivering the strongest accuracy among the tested configurations (though Proto-HA remains SOTA on ESC-50). The approach is presented as a general framework with potential applicability beyond audio to other modalities, and future work includes extending RDFT to additional metric-based FSL methods and providing theoretical insights.

Abstract

The Prototypical Network (ProtoNet) has emerged as a popular choice in Few-shot Learning (FSL) scenarios due to its remarkable performance and straightforward implementation. Building upon such success, we first propose a simple (yet novel) method to fine-tune a ProtoNet on the (labeled) support set of the test episode of a C-way-K-shot test episode (without using the query set which is only used for evaluation). We then propose an algorithmic framework that combines ProtoNet with optimization-based FSL algorithms (MAML and Meta-Curvature) to work with such a fine-tuning method. Since optimization-based algorithms endow the target learner model with the ability to fast adaption to only a few samples, we utilize ProtoNet as the target model to enhance its fine-tuning performance with the help of a specifically designed episodic fine-tuning strategy. The experimental results confirm that our proposed models, MAML-Proto and MC-Proto, combined with our unique fine-tuning method, outperform regular ProtoNet by a large margin in few-shot audio classification tasks on the ESC-50 and Speech Commands v2 datasets. We note that although we have only applied our model to the audio domain, it is a general method and can be easily extended to other domains.
Paper Structure (15 sections, 5 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 5 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: Working mechanism of proposed RDFT method. Here a 3-way-5-shot test episode is used as an example. The 3-way-5-shot labeled support set is divided into a 3-way-4-shot sub-support set and a fake query set, which are together used for fine-tuning the Prototypical Network before evaluating on the original test episode. Here we only present one of the divisions, the blue boxes will move horizontally until every column has been selected as a fake query set.
  • Figure 2: The change in test ACC of regular ProtoNet after fine-tuning (ACC with fine-tuning - ACC w/o fine-tuning) with different combinations of learning rates and number of gradient steps. The black dotted line represents where the fine-tuning effect on ACC is 0.