Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning
Liang-Yeh Shen, Shi-Xin Fang, Yi-Cheng Lin, Huang-Cheng Chou, Hung-yi Lee
TL;DR
Meta-PerSER addresses listener-specific interpretation of emotion in speech, a departure from conventional aggregate-label SER. It leverages a Model-Agnostic Meta-Learning (MAML) framework, enhanced with Combined-Set Meta-Training, Derivative Annealing, and per-layer per-step learning rates to enable rapid personalization from only a few labeled examples. The approach fuses robust representations from a pre-trained self-supervised backbone with meta-learning to capture subjective emotion perception without extensive annotation. Evaluation on the IEMOCAP dataset shows significant gains over baselines in both seen and unseen annotator scenarios, demonstrating effective generalization to new listeners and annotation styles. The work suggests broad potential for personalized SER and other subjective tasks, and outlines future directions including multilingual and low-resource settings, and extensions to hate speech detection and customer experience recognition.
Abstract
This paper introduces Meta-PerSER, a novel meta-learning framework that personalizes Speech Emotion Recognition (SER) by adapting to each listener's unique way of interpreting emotion. Conventional SER systems rely on aggregated annotations, which often overlook individual subtleties and lead to inconsistent predictions. In contrast, Meta-PerSER leverages a Model-Agnostic Meta-Learning (MAML) approach enhanced with Combined-Set Meta-Training, Derivative Annealing, and per-layer per-step learning rates, enabling rapid adaptation with only a few labeled examples. By integrating robust representations from pre-trained self-supervised models, our framework first captures general emotional cues and then fine-tunes itself to personal annotation styles. Experiments on the IEMOCAP corpus demonstrate that Meta-PerSER significantly outperforms baseline methods in both seen and unseen data scenarios, highlighting its promise for personalized emotion recognition.
