Prototypical Contrastive Learning For Improved Few-Shot Audio Classification
Christos Sgouropoulos, Christos Nikou, Stefanos Vlachos, Vasileios Theiou, Christos Foukanelis, Theodoros Giannakopoulos
TL;DR
This work tackles few-shot audio classification by integrating supervised angular contrastive loss into prototypical few-shot training, enhanced with SpecAugment and self-attention to produce robust unified embeddings. The authors design four modules (augmentation, embedding, few-shot, and contrastive) and compare CPL and Angular Prototype Loss within two training regimes, demonstrating state-of-the-art performance on MetaAudio in a 5-way 5-shot setup. Key findings show that angular loss consistently improves representations over standard contrastive losses and plain ProtoNets, often matching or surpassing optimization-based methods like MAML while requiring significantly less computation. The approach advances practical few-shot audio classification and promotes reproducibility with detailed methodology and dataset handling.
Abstract
Few-shot learning has emerged as a powerful paradigm for training models with limited labeled data, addressing challenges in scenarios where large-scale annotation is impractical. While extensive research has been conducted in the image domain, few-shot learning in audio classification remains relatively underexplored. In this work, we investigate the effect of integrating supervised contrastive loss into prototypical few shot training for audio classification. In detail, we demonstrate that angular loss further improves the performance compared to the standard contrastive loss. Our method leverages SpecAugment followed by a self-attention mechanism to encapsulate diverse information of augmented input versions into one unified embedding. We evaluate our approach on MetaAudio, a benchmark including five datasets with predefined splits, standardized preprocessing, and a comprehensive set of few-shot learning models for comparison. The proposed approach achieves state-of-the-art performance in a 5-way, 5-shot setting.
