Table of Contents
Fetching ...

Thousand Voices of Trauma: A Large-Scale Synthetic Dataset for Modeling Prolonged Exposure Therapy Conversations

Suhas BN, Andrew M. Sherrill, Rosa I. Arriaga, Chris W. Wiese, Saeed Abdullah

TL;DR

This paper addresses the paucity of trauma-focused clinical dialogue data by introducing Thousand Voices of Trauma, a large-scale synthetic dataset of 3,000 Prolonged Exposure therapy conversations drawn from 500 simulated cases across six therapy phases. Using clinically informed prompts and Claude Sonnet 3.5, the authors generate demographic- and trauma-diverse transcripts with 20 trauma types and 10 associated behaviors, accompanied by an emotion-trajectory benchmark for evaluating AI models. Expert validation by therapists assesses clinical fidelity, identifying strengths in emotional depth and narrative realism while highlighting areas for improved flow and authenticity. A standardized benchmark based on six PE phases and three similarity metrics enables objective model comparison, and the dataset is released with accompanying code, supporting broader research in privacy-preserving, trauma-focused AI tools for clinician training and patient-facing applications. Overall, the work offers a scalable, ethically mindful resource to advance PE therapy modeling, evaluation, and training while mitigating real-world data privacy constraints.

Abstract

The advancement of AI systems for mental health support is hindered by limited access to therapeutic conversation data, particularly for trauma treatment. We present Thousand Voices of Trauma, a synthetic benchmark dataset of 3,000 therapy conversations based on Prolonged Exposure therapy protocols for Post-traumatic Stress Disorder (PTSD). The dataset comprises 500 unique cases, each explored through six conversational perspectives that mirror the progression of therapy from initial anxiety to peak distress to emotional processing. We incorporated diverse demographic profiles (ages 18-80, M=49.3, 49.4% male, 44.4% female, 6.2% non-binary), 20 trauma types, and 10 trauma-related behaviors using deterministic and probabilistic generation methods. Analysis reveals realistic distributions of trauma types (witnessing violence 10.6%, bullying 10.2%) and symptoms (nightmares 23.4%, substance abuse 20.8%). Clinical experts validated the dataset's therapeutic fidelity, highlighting its emotional depth while suggesting refinements for greater authenticity. We also developed an emotional trajectory benchmark with standardized metrics for evaluating model responses. This privacy-preserving dataset addresses critical gaps in trauma-focused mental health data, offering a valuable resource for advancing both patient-facing applications and clinician training tools.

Thousand Voices of Trauma: A Large-Scale Synthetic Dataset for Modeling Prolonged Exposure Therapy Conversations

TL;DR

This paper addresses the paucity of trauma-focused clinical dialogue data by introducing Thousand Voices of Trauma, a large-scale synthetic dataset of 3,000 Prolonged Exposure therapy conversations drawn from 500 simulated cases across six therapy phases. Using clinically informed prompts and Claude Sonnet 3.5, the authors generate demographic- and trauma-diverse transcripts with 20 trauma types and 10 associated behaviors, accompanied by an emotion-trajectory benchmark for evaluating AI models. Expert validation by therapists assesses clinical fidelity, identifying strengths in emotional depth and narrative realism while highlighting areas for improved flow and authenticity. A standardized benchmark based on six PE phases and three similarity metrics enables objective model comparison, and the dataset is released with accompanying code, supporting broader research in privacy-preserving, trauma-focused AI tools for clinician training and patient-facing applications. Overall, the work offers a scalable, ethically mindful resource to advance PE therapy modeling, evaluation, and training while mitigating real-world data privacy constraints.

Abstract

The advancement of AI systems for mental health support is hindered by limited access to therapeutic conversation data, particularly for trauma treatment. We present Thousand Voices of Trauma, a synthetic benchmark dataset of 3,000 therapy conversations based on Prolonged Exposure therapy protocols for Post-traumatic Stress Disorder (PTSD). The dataset comprises 500 unique cases, each explored through six conversational perspectives that mirror the progression of therapy from initial anxiety to peak distress to emotional processing. We incorporated diverse demographic profiles (ages 18-80, M=49.3, 49.4% male, 44.4% female, 6.2% non-binary), 20 trauma types, and 10 trauma-related behaviors using deterministic and probabilistic generation methods. Analysis reveals realistic distributions of trauma types (witnessing violence 10.6%, bullying 10.2%) and symptoms (nightmares 23.4%, substance abuse 20.8%). Clinical experts validated the dataset's therapeutic fidelity, highlighting its emotional depth while suggesting refinements for greater authenticity. We also developed an emotional trajectory benchmark with standardized metrics for evaluating model responses. This privacy-preserving dataset addresses critical gaps in trauma-focused mental health data, offering a valuable resource for advancing both patient-facing applications and clinician training tools.

Paper Structure

This paper contains 27 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Demographic distribution of synthetic participants across gender, age, ethnicity, and relationship status. Most identified as male (247) or female (222), with 31 non-binary participants 27uscensus2025. Ages spanned under 10 to over 90, with a majority between 30–70. Ethnicities were diverse, led by Latin American, North American, and South/Southeast Asian groups. Most participants were married or single.
  • Figure 2: Distribution of trauma types and exhibited behaviors in synthetic participants. Nightmares, substance abuse, and compulsive behaviors were most common. Top trauma types included witnessing violence, bullying, neglect, and medical trauma. Less frequent but notable were abuse-related and combat-related experiences.
  • Figure 3: The figures illustrate structure and language diversity in synthetic therapist-client dialogues. The Utterance Length Distribution (top) shows clients often speak at length (>50 words), while therapists' responses are concise, reflecting the client-centered nature of therapy. The Vocabulary Diversity (bottom) reveals clients use 24,000 unique words, far more than therapists ( 5,000), likely due to personal narratives, whereas therapists maintain structured, reflective language.
  • Figure 4: The figure depicts conversation flow in synthetic dialogues, showing exchange lengths over time across three phases: Setup, Exposure, and Processing. In Setup, lengths remain stable ( 40 to 45 words). Exposure sees a steady increase, peaking at 60 words, indicating deeper engagement. Processing shows fluctuations, reflecting varying reflection and emotional processing. The shaded region represents variability across conversations.
  • Figure 5: Therapist ratings ($N{=}7$) across four dimensions of synthetic PE sessions: Content Depth, Perceived Value, Session Appropriateness, and Patient Engagement.
  • ...and 1 more figures