Table of Contents
Fetching ...

Machine Unlearning for Speaker-Agnostic Detection of Gender-Based Violence Condition in Speech

Emma Reyner-Fuentes, Esther Rituerto-Gonzalez, Carmen Pelaez-Moreno

TL;DR

This work tackles the challenge of confounding speaker identity in speech-based detection of gender-based violence victim condition (GBVVC) by employing domain-adversarial training to learn speaker-invariant paralinguistic biomarkers. The proposed Domain-Adversarial Model (DAM) reduces speaker-identification information by at least $26.95\%$ relative and yields a $6.37\%$ relative improvement in GBVVC classification accuracy, promoting generalization across unseen speakers. Analyses show that GBVVC predictions correlate with pre-clinical PTSD symptoms (EGS-R scores), particularly when speaker information is unlearned, suggesting that trauma-related vocal cues drive the model. The approach supports privacy-preserving, ethics-aware AI for clinical screening, while acknowledging limitations such as a small victim sample and focusing on 1-second acoustic frames, with avenues for multimodal and more complex architectures in future work.

Abstract

Gender-based violence is a pervasive public health issue that severely impacts women's mental health, often leading to conditions such as in anxiety, depression, post-traumatic stress disorder, and substance abuse. Identifying the combination of these various mental health conditions could then point to someone who is a victim of gender-based violence. And while speech-based artificial intelligence tools show as a promising solution for mental health screening, their performance often deteriorates when encountering speech from previously unseen speakers, a sign that speaker traits may be confounding factors. This study introduces a speaker-agnostic approach to detecting the gender-based violence victim condition from speech, aiming to develop robust artificial intelligence models capable of generalizing across speakers. By employing domain-adversarial training, we reduce the influence of speaker identity on model predictions, we achieve a 26.95% relative reduction in speaker identification accuracy while improving gender-based violence victim condition classification accuracy by 6.37% (relative). These results suggest that our models effectively capture paralinguistic biomarkers linked to the gender-based violence victim condition, rather than speaker-specific traits. Additionally, the model's predictions show moderate correlation with pre-clinical post-traumatic stress disorder symptoms, supporting the relevance of speech as a non-invasive tool for mental health monitoring. This work lays the foundation for ethical, privacy-preserving artificial intelligence systems to support clinical screening of gender-based violence survivors.

Machine Unlearning for Speaker-Agnostic Detection of Gender-Based Violence Condition in Speech

TL;DR

This work tackles the challenge of confounding speaker identity in speech-based detection of gender-based violence victim condition (GBVVC) by employing domain-adversarial training to learn speaker-invariant paralinguistic biomarkers. The proposed Domain-Adversarial Model (DAM) reduces speaker-identification information by at least relative and yields a relative improvement in GBVVC classification accuracy, promoting generalization across unseen speakers. Analyses show that GBVVC predictions correlate with pre-clinical PTSD symptoms (EGS-R scores), particularly when speaker information is unlearned, suggesting that trauma-related vocal cues drive the model. The approach supports privacy-preserving, ethics-aware AI for clinical screening, while acknowledging limitations such as a small victim sample and focusing on 1-second acoustic frames, with avenues for multimodal and more complex architectures in future work.

Abstract

Gender-based violence is a pervasive public health issue that severely impacts women's mental health, often leading to conditions such as in anxiety, depression, post-traumatic stress disorder, and substance abuse. Identifying the combination of these various mental health conditions could then point to someone who is a victim of gender-based violence. And while speech-based artificial intelligence tools show as a promising solution for mental health screening, their performance often deteriorates when encountering speech from previously unseen speakers, a sign that speaker traits may be confounding factors. This study introduces a speaker-agnostic approach to detecting the gender-based violence victim condition from speech, aiming to develop robust artificial intelligence models capable of generalizing across speakers. By employing domain-adversarial training, we reduce the influence of speaker identity on model predictions, we achieve a 26.95% relative reduction in speaker identification accuracy while improving gender-based violence victim condition classification accuracy by 6.37% (relative). These results suggest that our models effectively capture paralinguistic biomarkers linked to the gender-based violence victim condition, rather than speaker-specific traits. Additionally, the model's predictions show moderate correlation with pre-clinical post-traumatic stress disorder symptoms, supporting the relevance of speech as a non-invasive tool for mental health monitoring. This work lays the foundation for ethical, privacy-preserving artificial intelligence systems to support clinical screening of gender-based violence survivors.

Paper Structure

This paper contains 12 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Original Domain-Adversarial Neural Network architecture ganin2016domainadversarial.
  • Figure 2: Confusion matrices for the Isolated Condition Model (ICM).
  • Figure 3: Confusion matrices for the Domain-Adversarial Model (DAM).
  • Figure 4: Unlearnt Speaker Model Architecture. The black, thinner arrows correspond to the forward step. The thicker, colored arrows correspond to the backpropagation steps.
  • Figure 5: Isolated Model Architectures. Black, thinner arrows indicate the forward pass. Thicker, colored arrows denote backpropagation flows.Top: Isolated Condition Model (ICM). Bottom: Isolated Speaker Model (ISM).
  • ...and 1 more figures