Machine Unlearning for Speaker-Agnostic Detection of Gender-Based Violence Condition in Speech
Emma Reyner-Fuentes, Esther Rituerto-Gonzalez, Carmen Pelaez-Moreno
TL;DR
This work tackles the challenge of confounding speaker identity in speech-based detection of gender-based violence victim condition (GBVVC) by employing domain-adversarial training to learn speaker-invariant paralinguistic biomarkers. The proposed Domain-Adversarial Model (DAM) reduces speaker-identification information by at least $26.95\%$ relative and yields a $6.37\%$ relative improvement in GBVVC classification accuracy, promoting generalization across unseen speakers. Analyses show that GBVVC predictions correlate with pre-clinical PTSD symptoms (EGS-R scores), particularly when speaker information is unlearned, suggesting that trauma-related vocal cues drive the model. The approach supports privacy-preserving, ethics-aware AI for clinical screening, while acknowledging limitations such as a small victim sample and focusing on 1-second acoustic frames, with avenues for multimodal and more complex architectures in future work.
Abstract
Gender-based violence is a pervasive public health issue that severely impacts women's mental health, often leading to conditions such as in anxiety, depression, post-traumatic stress disorder, and substance abuse. Identifying the combination of these various mental health conditions could then point to someone who is a victim of gender-based violence. And while speech-based artificial intelligence tools show as a promising solution for mental health screening, their performance often deteriorates when encountering speech from previously unseen speakers, a sign that speaker traits may be confounding factors. This study introduces a speaker-agnostic approach to detecting the gender-based violence victim condition from speech, aiming to develop robust artificial intelligence models capable of generalizing across speakers. By employing domain-adversarial training, we reduce the influence of speaker identity on model predictions, we achieve a 26.95% relative reduction in speaker identification accuracy while improving gender-based violence victim condition classification accuracy by 6.37% (relative). These results suggest that our models effectively capture paralinguistic biomarkers linked to the gender-based violence victim condition, rather than speaker-specific traits. Additionally, the model's predictions show moderate correlation with pre-clinical post-traumatic stress disorder symptoms, supporting the relevance of speech as a non-invasive tool for mental health monitoring. This work lays the foundation for ethical, privacy-preserving artificial intelligence systems to support clinical screening of gender-based violence survivors.
