The Language of Trauma: Modeling Traumatic Event Descriptions Across Domains with Explainable AI
Miriam Schirmer, Tobias Leemann, Gjergji Kasneci, Jürgen Pfeffer, David Jurgens
TL;DR
The paper presents TRACE, a cross-domain dataset for traumatic event detection drawn from genocide court transcripts, PTSD Reddit, counseling conversations, and Incel posts. It systematically compares multiple language-model architectures, with fine-tuned RoBERTa delivering strong in-domain and cross-domain performance, while GPT-4 offers competitive zero-shot results on some datasets. The authors couple model evaluations with explainable AI techniques (SHAP, SLALOM, and concept-based explanations) to reveal both universal and domain-specific trauma cues. Findings highlight transferable trauma features across contexts and underscore the potential for AI-assisted trauma detection tools, while acknowledging limitations from dataset imbalances and context variability.
Abstract
Psychological trauma can manifest following various distressing events and is captured in diverse online contexts. However, studies traditionally focus on a single aspect of trauma, often neglecting the transferability of findings across different scenarios. We address this gap by training language models with progressing complexity on trauma-related datasets, including genocide-related court data, a Reddit dataset on post-traumatic stress disorder (PTSD), counseling conversations, and Incel forum posts. Our results show that the fine-tuned RoBERTa model excels in predicting traumatic events across domains, slightly outperforming large language models like GPT-4. Additionally, SLALOM-feature scores and conceptual explanations effectively differentiate and cluster trauma-related language, highlighting different trauma aspects and identifying sexual abuse and experiences related to death as a common traumatic event across all datasets. This transferability is crucial as it allows for the development of tools to enhance trauma detection and intervention in diverse populations and settings.
