Table of Contents
Fetching ...

The Language of Trauma: Modeling Traumatic Event Descriptions Across Domains with Explainable AI

Miriam Schirmer, Tobias Leemann, Gjergji Kasneci, Jürgen Pfeffer, David Jurgens

TL;DR

The paper presents TRACE, a cross-domain dataset for traumatic event detection drawn from genocide court transcripts, PTSD Reddit, counseling conversations, and Incel posts. It systematically compares multiple language-model architectures, with fine-tuned RoBERTa delivering strong in-domain and cross-domain performance, while GPT-4 offers competitive zero-shot results on some datasets. The authors couple model evaluations with explainable AI techniques (SHAP, SLALOM, and concept-based explanations) to reveal both universal and domain-specific trauma cues. Findings highlight transferable trauma features across contexts and underscore the potential for AI-assisted trauma detection tools, while acknowledging limitations from dataset imbalances and context variability.

Abstract

Psychological trauma can manifest following various distressing events and is captured in diverse online contexts. However, studies traditionally focus on a single aspect of trauma, often neglecting the transferability of findings across different scenarios. We address this gap by training language models with progressing complexity on trauma-related datasets, including genocide-related court data, a Reddit dataset on post-traumatic stress disorder (PTSD), counseling conversations, and Incel forum posts. Our results show that the fine-tuned RoBERTa model excels in predicting traumatic events across domains, slightly outperforming large language models like GPT-4. Additionally, SLALOM-feature scores and conceptual explanations effectively differentiate and cluster trauma-related language, highlighting different trauma aspects and identifying sexual abuse and experiences related to death as a common traumatic event across all datasets. This transferability is crucial as it allows for the development of tools to enhance trauma detection and intervention in diverse populations and settings.

The Language of Trauma: Modeling Traumatic Event Descriptions Across Domains with Explainable AI

TL;DR

The paper presents TRACE, a cross-domain dataset for traumatic event detection drawn from genocide court transcripts, PTSD Reddit, counseling conversations, and Incel posts. It systematically compares multiple language-model architectures, with fine-tuned RoBERTa delivering strong in-domain and cross-domain performance, while GPT-4 offers competitive zero-shot results on some datasets. The authors couple model evaluations with explainable AI techniques (SHAP, SLALOM, and concept-based explanations) to reveal both universal and domain-specific trauma cues. Findings highlight transferable trauma features across contexts and underscore the potential for AI-assisted trauma detection tools, while acknowledging limitations from dataset imbalances and context variability.

Abstract

Psychological trauma can manifest following various distressing events and is captured in diverse online contexts. However, studies traditionally focus on a single aspect of trauma, often neglecting the transferability of findings across different scenarios. We address this gap by training language models with progressing complexity on trauma-related datasets, including genocide-related court data, a Reddit dataset on post-traumatic stress disorder (PTSD), counseling conversations, and Incel forum posts. Our results show that the fine-tuned RoBERTa model excels in predicting traumatic events across domains, slightly outperforming large language models like GPT-4. Additionally, SLALOM-feature scores and conceptual explanations effectively differentiate and cluster trauma-related language, highlighting different trauma aspects and identifying sexual abuse and experiences related to death as a common traumatic event across all datasets. This transferability is crucial as it allows for the development of tools to enhance trauma detection and intervention in diverse populations and settings.
Paper Structure (26 sections, 12 figures, 8 tables)

This paper contains 26 sections, 12 figures, 8 tables.

Figures (12)

  • Figure 1: We (1) create a cross-domain trauma dataset, (2) classify traumatic events with models of different complexity, and (3) use XAI methods to identify overlapping characteristics of traumatic events.
  • Figure 2: Cross-domain performance (AUC-ROC) when a RoBERTa model is trained on one dataset and tested on other datasets.
  • Figure 3: SHAP values for an instance from the Counseling Dataset: "My dad doesn't like the fact that I'm a boy. He yells at me daily because of it and he tells me I'm extreme and over dramatic. I get so depressed because of my dad's yelling. He keeps asking me why I can't just be happy the way I am and yells at me on a daily basis. Is this considered emotional abuse?"
  • Figure 4: SLALOM feature importance scores based on the full dataset and the RoBERTa model.
  • Figure 5: Trauma-related concepts found in the three datasets (Most salient examples, RoBERTa Model). For more examples see \ref{['sec:appendix']}.
  • ...and 7 more figures