Table of Contents
Fetching ...

Speaker Anonymisation for Speech-based Suicide Risk Detection

Ziyun Cui, Sike Jia, Yang Lin, Yinan Duan, Diyang Qu, Runsen Chen, Chao Zhang, Chang Lei, Wen Wu

TL;DR

This work tackles privacy concerns in speech-based adolescent suicide-risk detection by systematically evaluating speaker anonymisation methods. It compares traditional signal processing, neural voice conversion, and text-to-speech approaches using a multi-dimensional framework that measures privacy against preservation of semantic and emotional cues relevant to risk detection. The key finding is that combining complementary anonymisation methods (notably RVC for acoustic features and CosyVoice for semantic content) yields near-original detection performance with robust speaker de-identification, demonstrating the potential of hybrid privacy-preserving pipelines. The results have practical implications for deploying speech-based mental health screening systems while protecting vulnerable individuals' identities.

Abstract

Adolescent suicide is a critical global health issue, and speech provides a cost-effective modality for automatic suicide risk detection. Given the vulnerable population, protecting speaker identity is particularly important, as speech itself can reveal personally identifiable information if the data is leaked or maliciously exploited. This work presents the first systematic study of speaker anonymisation for speech-based suicide risk detection. A broad range of anonymisation methods are investigated, including techniques based on traditional signal processing, neural voice conversion, and speech synthesis. A comprehensive evaluation framework is built to assess the trade-off between protecting speaker identity and preserving information essential for suicide risk detection. Results show that combining anonymisation methods that retain complementary information yields detection performance comparable to that of original speech, while achieving protection of speaker identity for vulnerable populations.

Speaker Anonymisation for Speech-based Suicide Risk Detection

TL;DR

This work tackles privacy concerns in speech-based adolescent suicide-risk detection by systematically evaluating speaker anonymisation methods. It compares traditional signal processing, neural voice conversion, and text-to-speech approaches using a multi-dimensional framework that measures privacy against preservation of semantic and emotional cues relevant to risk detection. The key finding is that combining complementary anonymisation methods (notably RVC for acoustic features and CosyVoice for semantic content) yields near-original detection performance with robust speaker de-identification, demonstrating the potential of hybrid privacy-preserving pipelines. The results have practical implications for deploying speech-based mental health screening systems while protecting vulnerable individuals' identities.

Abstract

Adolescent suicide is a critical global health issue, and speech provides a cost-effective modality for automatic suicide risk detection. Given the vulnerable population, protecting speaker identity is particularly important, as speech itself can reveal personally identifiable information if the data is leaked or maliciously exploited. This work presents the first systematic study of speaker anonymisation for speech-based suicide risk detection. A broad range of anonymisation methods are investigated, including techniques based on traditional signal processing, neural voice conversion, and speech synthesis. A comprehensive evaluation framework is built to assess the trade-off between protecting speaker identity and preserving information essential for suicide risk detection. Results show that combining anonymisation methods that retain complementary information yields detection performance comparable to that of original speech, while achieving protection of speaker identity for vulnerable populations.

Paper Structure

This paper contains 16 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Importance of anonymisation for protecting speaker identity.
  • Figure 2: Structure used for speech-based suicide risk detection.
  • Figure 3: Visualisation of pitch change in semitones relative to A4 (440 Hz).
  • Figure 4: Accuracy of suicide risk detection using the original and anonymised speech. Average of three runs reported.