Speaker Anonymisation for Speech-based Suicide Risk Detection
Ziyun Cui, Sike Jia, Yang Lin, Yinan Duan, Diyang Qu, Runsen Chen, Chao Zhang, Chang Lei, Wen Wu
TL;DR
This work tackles privacy concerns in speech-based adolescent suicide-risk detection by systematically evaluating speaker anonymisation methods. It compares traditional signal processing, neural voice conversion, and text-to-speech approaches using a multi-dimensional framework that measures privacy against preservation of semantic and emotional cues relevant to risk detection. The key finding is that combining complementary anonymisation methods (notably RVC for acoustic features and CosyVoice for semantic content) yields near-original detection performance with robust speaker de-identification, demonstrating the potential of hybrid privacy-preserving pipelines. The results have practical implications for deploying speech-based mental health screening systems while protecting vulnerable individuals' identities.
Abstract
Adolescent suicide is a critical global health issue, and speech provides a cost-effective modality for automatic suicide risk detection. Given the vulnerable population, protecting speaker identity is particularly important, as speech itself can reveal personally identifiable information if the data is leaked or maliciously exploited. This work presents the first systematic study of speaker anonymisation for speech-based suicide risk detection. A broad range of anonymisation methods are investigated, including techniques based on traditional signal processing, neural voice conversion, and speech synthesis. A comprehensive evaluation framework is built to assess the trade-off between protecting speaker identity and preserving information essential for suicide risk detection. Results show that combining anonymisation methods that retain complementary information yields detection performance comparable to that of original speech, while achieving protection of speaker identity for vulnerable populations.
