Table of Contents
Fetching ...

Target speaker anonymization in multi-speaker recordings

Natalia Tomashenko, Junichi Yamagishi, Xin Wang, Yun Liu, Emmanuel Vincent

TL;DR

This work tackles the problem of anonymizing a single target speaker within multi-speaker conversations, a scenario common in call centers. It introduces Target Speaker Anonymization (TSA), a three-step pipeline combining Target Speaker Extraction (TSE), selective anonymization of the extracted speech, and recombination with non-target speech. The authors propose a comprehensive evaluation framework measuring privacy against semi-informed attackers via $EER$ and utility via $tcpWER$, DER, and targeted ASR WER, augmented by TSE quality via $SI-SDR$. Using two state-of-the-art TSE models and a strong VQ‑BN based anonymization frontend, they analyze performance across overlapped speech conditions and reveal trade-offs between privacy gains and utility loss, highlighting the need for joint optimization of TSE and ASR. The work advances practical privacy-preserving speech in realistic multi-speaker settings and suggests directions for improving attacker resilience and downstream task performance.

Abstract

Most of the existing speaker anonymization research has focused on single-speaker audio, leading to the development of techniques and evaluation metrics optimized for such condition. This study addresses the significant challenge of speaker anonymization within multi-speaker conversational audio, specifically when only a single target speaker needs to be anonymized. This scenario is highly relevant in contexts like call centers, where customer privacy necessitates anonymizing only the customer's voice in interactions with operators. Conventional anonymization methods are often not suitable for this task. Moreover, current evaluation methodology does not allow us to accurately assess privacy protection and utility in this complex multi-speaker scenario. This work aims to bridge these gaps by exploring effective strategies for targeted speaker anonymization in conversational audio, highlighting potential problems in their development and proposing corresponding improved evaluation methodologies.

Target speaker anonymization in multi-speaker recordings

TL;DR

This work tackles the problem of anonymizing a single target speaker within multi-speaker conversations, a scenario common in call centers. It introduces Target Speaker Anonymization (TSA), a three-step pipeline combining Target Speaker Extraction (TSE), selective anonymization of the extracted speech, and recombination with non-target speech. The authors propose a comprehensive evaluation framework measuring privacy against semi-informed attackers via and utility via , DER, and targeted ASR WER, augmented by TSE quality via . Using two state-of-the-art TSE models and a strong VQ‑BN based anonymization frontend, they analyze performance across overlapped speech conditions and reveal trade-offs between privacy gains and utility loss, highlighting the need for joint optimization of TSE and ASR. The work advances practical privacy-preserving speech in realistic multi-speaker settings and suggests directions for improving attacker resilience and downstream task performance.

Abstract

Most of the existing speaker anonymization research has focused on single-speaker audio, leading to the development of techniques and evaluation metrics optimized for such condition. This study addresses the significant challenge of speaker anonymization within multi-speaker conversational audio, specifically when only a single target speaker needs to be anonymized. This scenario is highly relevant in contexts like call centers, where customer privacy necessitates anonymizing only the customer's voice in interactions with operators. Conventional anonymization methods are often not suitable for this task. Moreover, current evaluation methodology does not allow us to accurately assess privacy protection and utility in this complex multi-speaker scenario. This work aims to bridge these gaps by exploring effective strategies for targeted speaker anonymization in conversational audio, highlighting potential problems in their development and proposing corresponding improved evaluation methodologies.

Paper Structure

This paper contains 22 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Target speaker anonymization (TSA)