Table of Contents
Fetching ...

Prosody-Driven Privacy-Preserving Dementia Detection

Dominika Woszczyk, Ranya Aloufi, Soteris Demetriou

TL;DR

This work tackles privacy concerns in speaker embeddings used for dementia detection by introducing a prosody-driven anonymization approach that preserves diagnostic utility without requiring a dedicated dementia classifier. It combines adversarial learning and mutual-information-guided shuffling to disentangle dementia-relevant prosody from speaker identity, using domain knowledge about dementia speech. Empirical results on ADReSS and ADReSSo show strong privacy gains (near-zero speaker recognition) while maintaining dementia-detection performance around 0.73–0.74 F1, and zero-shot TTS evaluations indicate only moderate degradation in naturalness and intelligibility. The approach offers a practical path toward privacy-preserving healthcare analytics in settings with limited labeled data, while highlighting limitations related to domain knowledge reliance and dataset scale.

Abstract

Speaker embeddings extracted from voice recordings have been proven valuable for dementia detection. However, by their nature, these embeddings contain identifiable information which raises privacy concerns. In this work, we aim to anonymize embeddings while preserving the diagnostic utility for dementia detection. Previous studies rely on adversarial learning and models trained on the target attribute and struggle in limited-resource settings. We propose a novel approach that leverages domain knowledge to disentangle prosody features relevant to dementia from speaker embeddings without relying on a dementia classifier. Our experiments show the effectiveness of our approach in preserving speaker privacy (speaker recognition F1-score .01%) while maintaining high dementia detection score F1-score of 74% on the ADReSS dataset. Our results are also on par with a more constrained classifier-dependent system on ADReSSo (.01% and .66%), and have no impact on synthesized speech naturalness.

Prosody-Driven Privacy-Preserving Dementia Detection

TL;DR

This work tackles privacy concerns in speaker embeddings used for dementia detection by introducing a prosody-driven anonymization approach that preserves diagnostic utility without requiring a dedicated dementia classifier. It combines adversarial learning and mutual-information-guided shuffling to disentangle dementia-relevant prosody from speaker identity, using domain knowledge about dementia speech. Empirical results on ADReSS and ADReSSo show strong privacy gains (near-zero speaker recognition) while maintaining dementia-detection performance around 0.73–0.74 F1, and zero-shot TTS evaluations indicate only moderate degradation in naturalness and intelligibility. The approach offers a practical path toward privacy-preserving healthcare analytics in settings with limited labeled data, while highlighting limitations related to domain knowledge reliance and dataset scale.

Abstract

Speaker embeddings extracted from voice recordings have been proven valuable for dementia detection. However, by their nature, these embeddings contain identifiable information which raises privacy concerns. In this work, we aim to anonymize embeddings while preserving the diagnostic utility for dementia detection. Previous studies rely on adversarial learning and models trained on the target attribute and struggle in limited-resource settings. We propose a novel approach that leverages domain knowledge to disentangle prosody features relevant to dementia from speaker embeddings without relying on a dementia classifier. Our experiments show the effectiveness of our approach in preserving speaker privacy (speaker recognition F1-score .01%) while maintaining high dementia detection score F1-score of 74% on the ADReSS dataset. Our results are also on par with a more constrained classifier-dependent system on ADReSSo (.01% and .66%), and have no impact on synthesized speech naturalness.
Paper Structure (18 sections, 2 equations, 1 figure, 3 tables)

This paper contains 18 sections, 2 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Mutual information scores of key prosodic features extracted from audio segments for dementia label (left) and speaker identity (ID) (right) on the ADReSS dataset.