Table of Contents
Fetching ...

On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations

Yaqian Hao, Chenguang Hu, Yingying Gao, Shilei Zhang, Junlan Feng

TL;DR

This work addresses the reliability of speech classification by tackling confidence calibration rather than solely pursuing accuracy. It proposes a joint Energy-Based Model (EBM) framework that couples a discriminative classifier with a generative energy model, enabling calibrated predictions without performance loss. Across language, emotion, and age recognition tasks, EBMs substantially reduce calibration error (ECE) and improve NLL while maintaining or slightly improving accuracy, outperforming post-hoc calibrators like Temperature scaling. The findings demonstrate the practical value of EBMs for robust, uncertainty-aware speech classification in real-world settings.

Abstract

For speech classification tasks, deep learning models often achieve high accuracy but exhibit shortcomings in calibration, manifesting as classifiers exhibiting overconfidence. The significance of calibration lies in its critical role in guaranteeing the reliability of decision-making within deep learning systems. This study explores the effectiveness of Energy-Based Models in calibrating confidence for speech classification tasks by training a joint EBM integrating a discriminative and a generative model, thereby enhancing the classifiers calibration and mitigating overconfidence. Experimental evaluations conducted on three speech classification tasks specifically: age, emotion, and language recognition. Our findings highlight the competitive performance of EBMs in calibrating the speech classification models. This research emphasizes the potential of EBMs in speech classification tasks, demonstrating their ability to enhance calibration without sacrificing accuracy.

On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations

TL;DR

This work addresses the reliability of speech classification by tackling confidence calibration rather than solely pursuing accuracy. It proposes a joint Energy-Based Model (EBM) framework that couples a discriminative classifier with a generative energy model, enabling calibrated predictions without performance loss. Across language, emotion, and age recognition tasks, EBMs substantially reduce calibration error (ECE) and improve NLL while maintaining or slightly improving accuracy, outperforming post-hoc calibrators like Temperature scaling. The findings demonstrate the practical value of EBMs for robust, uncertainty-aware speech classification in real-world settings.

Abstract

For speech classification tasks, deep learning models often achieve high accuracy but exhibit shortcomings in calibration, manifesting as classifiers exhibiting overconfidence. The significance of calibration lies in its critical role in guaranteeing the reliability of decision-making within deep learning systems. This study explores the effectiveness of Energy-Based Models in calibrating confidence for speech classification tasks by training a joint EBM integrating a discriminative and a generative model, thereby enhancing the classifiers calibration and mitigating overconfidence. Experimental evaluations conducted on three speech classification tasks specifically: age, emotion, and language recognition. Our findings highlight the competitive performance of EBMs in calibrating the speech classification models. This research emphasizes the potential of EBMs in speech classification tasks, demonstrating their ability to enhance calibration without sacrificing accuracy.

Paper Structure

This paper contains 12 sections, 12 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Calibration results for three speech classification tasks. A smaller ECE indicates better calibration.
  • Figure 2: The evolving trends of test ACC and NLL for three speech classification tasks with respect to training epochs.
  • Figure 3: Confidence distributions of softmax-based model and energy-based model.