Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization

Zexin Cai; Henry Li Xinyuan; Ashi Garg; Leibny Paola García-Perera; Kevin Duh; Sanjeev Khudanpur; Nicholas Andrews; Matthew Wiesner

Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization

Zexin Cai, Henry Li Xinyuan, Ashi Garg, Leibny Paola García-Perera, Kevin Duh, Sanjeev Khudanpur, Nicholas Andrews, Matthew Wiesner

TL;DR

Various speaker anonymization pipelines are developed and it is found that it is feasible to train a semi-effective speaker verification system using only emotion representations, demonstrating the challenge of separating these two modalities.

Abstract

Advances in speech technology now allow unprecedented access to personally identifiable information through speech. To protect such information, the differential privacy field has explored ways to anonymize speech while preserving its utility, including linguistic and paralinguistic aspects. However, anonymizing speech while maintaining emotional state remains challenging. We explore this problem in the context of the VoicePrivacy 2024 challenge. Specifically, we developed various speaker anonymization pipelines and find that approaches either excel at anonymization or preserving emotion state, but not both simultaneously. Achieving both would require an in-domain emotion recognizer. Additionally, we found that it is feasible to train a semi-effective speaker verification system using only emotion representations, demonstrating the challenge of separating these two modalities.

Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization

TL;DR

Abstract

Paper Structure (11 sections, 4 figures, 2 tables)

This paper contains 11 sections, 4 figures, 2 tables.

Introduction
Method
Task Definition and Evaluation Metrics
Anonymization Approaches
Experiments
Dataset
Experimental Details
Anonymization Performance
Achieving the best of both worlds
Speaker-Identifying Information in Emotion Embeddings
Discussions and Conclusions

Figures (4)

Figure 1: Speech anonymization task and evaluation pipeline (w.r.t the VoicePrivacy 2024 Challenge)
Figure 2: VC-based and cascaded ASR-TTS anonymization process
Figure 3: Privacy-emotion preservation trade-off
Figure 4: t-SNE visualization on libri-dev emotion embedding space

Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization

TL;DR

Abstract

Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization

Authors

TL;DR

Abstract

Table of Contents

Figures (4)