Table of Contents
Fetching ...

Emotion-Aware Quantization for Discrete Speech Representations: An Analysis of Emotion Preservation

Haoguang Zhou, Siyi Wang, Jingyao Wu, James Bailey, Ting Dang

Abstract

Modern speech systems increasingly use discretized self-supervised speech representations for compression and integration with token-based models, yet their impact on emotional information remains unclear. We study how residual vector quantization (RVQ) reshapes emotional information in discrete speech representations from both representation- and task-level perspectives. Our analysis shows that aggressive compression disproportionately degrades emotion, with uneven loss across emotion classes and model architectures. To address this, we introduce emotion-aware quantization using emotion-specific and emotion-biased codebooks, improving the preservation of both hard and soft emotion perception. We further propose Emo-Q, a lightweight routed quantization method that selects emotion-specialized codebooks, improving emotion recognition performance at lower bitrates. These results highlight the importance of emotion-aware discretization for robust affective speech processing.

Emotion-Aware Quantization for Discrete Speech Representations: An Analysis of Emotion Preservation

Abstract

Modern speech systems increasingly use discretized self-supervised speech representations for compression and integration with token-based models, yet their impact on emotional information remains unclear. We study how residual vector quantization (RVQ) reshapes emotional information in discrete speech representations from both representation- and task-level perspectives. Our analysis shows that aggressive compression disproportionately degrades emotion, with uneven loss across emotion classes and model architectures. To address this, we introduce emotion-aware quantization using emotion-specific and emotion-biased codebooks, improving the preservation of both hard and soft emotion perception. We further propose Emo-Q, a lightweight routed quantization method that selects emotion-specialized codebooks, improving emotion recognition performance at lower bitrates. These results highlight the importance of emotion-aware discretization for robust affective speech processing.
Paper Structure (12 sections, 1 equation, 5 figures, 1 table)

This paper contains 12 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: Quantization of discrete speech representation
  • Figure 2: Overview of the pipeline. (a) Balanced and emotion-specific codebooks. (b) Representation-level evaluation: layer-wise degradation analysis (RQ1), primary emotion retention (RQ2), and soft distribution fidelity (RQ3). (c) Task-level evaluation: downstream SER via routed quantization (RQ4).
  • Figure 3: Reconstruction fidelity (cosine similarity, top) and primary emotion recall (bottom) versus quantization depth under RVQ for three different SSL frontends.
  • Figure 4: RQ2: affective retention (left) and codebook utilization (right) for balanced and emotion-specific quantization.
  • Figure 5: RQ3 evaluation: emotion distribution matching (left) and top2 emotion recall (right) for varying codebook training strategies.