Table of Contents
Fetching ...

ChatThero: An LLM-Supported Chatbot for Behavior Change and Therapeutic Support in Addiction Recovery

Junda Wang, Zonghai Yao, Lingxi Li, Junhui Qian, Zhichao Yang, Hong Yu

TL;DR

ChatThero introduces a memory-persistent, stressor-aware LLM chatbot for addiction recovery, addressing relapse risk and limited access to ongoing care. It models therapy as a multi-agent system with a Patient Agent, Environment Agent, and Therapy Agent, trained through supervised fine-tuning and direct preference optimization to learn MI/CBT strategies and cross-session carryover. The approach uses anonymized Reddit-derived profiles and a stressor ledger to simulate 3–6 session trajectories, evaluating both automatic and human clinician ratings on motivation, confidence, empathy, and clinical relevance. Results showChatThero outperforming baselines in single- and multi-session settings, especially for harder patient profiles, and demonstrate robustness to stressors, offering a scalable framework for addiction recovery with important ethical considerations and directions for real-world validation.

Abstract

Substance use disorders (SUDs) affect millions of people, and relapses are common, requiring multi-session treatments. Access to care is limited, which contributes to the challenge of recovery support. We present \textbf{ChatThero}, an innovative low-cost, multi-session, stressor-aware, and memory-persistent autonomous \emph{language agent} designed to facilitate long-term behavior change and therapeutic support in addiction recovery. Unlike existing work that mostly finetuned large language models (LLMs) on patient-therapist conversation data, ChatThero was trained in a multi-agent simulated environment that mirrors real therapy. We created anonymized patient profiles from recovery communities (e.g., Reddit). We classify patients as \texttt{easy}, \texttt{medium}, and \texttt{difficult}, three scales representing their resistance to recovery. We created an external environment by introducing stressors (e.g., social determinants of health) to simulate real-world situations. We dynamically inject clinically-grounded therapeutic strategies (motivational interview and cognitive behavioral therapy). Our evaluation, conducted by both human (blinded clinicians) and LLM-as-Judge, shows that ChatThero is superior in empathy and clinical relevance. We show that stressor simulation improves robustness of ChatThero. Explicit stressors increase relapse-like setbacks, matching real-world patterns. We evaluate ChatThero with behavioral change metrics. On a 1--5 scale, ChatThero raises \texttt{motivation} by $+1.71$ points (from $2.39$ to $4.10$) and \texttt{confidence} by $+1.67$ points (from $1.52$ to $3.19$), substantially outperforming GPT-5. On \texttt{difficult} patients, ChatThero reaches the success milestone with $26\%$ fewer turns than GPT-5.

ChatThero: An LLM-Supported Chatbot for Behavior Change and Therapeutic Support in Addiction Recovery

TL;DR

ChatThero introduces a memory-persistent, stressor-aware LLM chatbot for addiction recovery, addressing relapse risk and limited access to ongoing care. It models therapy as a multi-agent system with a Patient Agent, Environment Agent, and Therapy Agent, trained through supervised fine-tuning and direct preference optimization to learn MI/CBT strategies and cross-session carryover. The approach uses anonymized Reddit-derived profiles and a stressor ledger to simulate 3–6 session trajectories, evaluating both automatic and human clinician ratings on motivation, confidence, empathy, and clinical relevance. Results showChatThero outperforming baselines in single- and multi-session settings, especially for harder patient profiles, and demonstrate robustness to stressors, offering a scalable framework for addiction recovery with important ethical considerations and directions for real-world validation.

Abstract

Substance use disorders (SUDs) affect millions of people, and relapses are common, requiring multi-session treatments. Access to care is limited, which contributes to the challenge of recovery support. We present \textbf{ChatThero}, an innovative low-cost, multi-session, stressor-aware, and memory-persistent autonomous \emph{language agent} designed to facilitate long-term behavior change and therapeutic support in addiction recovery. Unlike existing work that mostly finetuned large language models (LLMs) on patient-therapist conversation data, ChatThero was trained in a multi-agent simulated environment that mirrors real therapy. We created anonymized patient profiles from recovery communities (e.g., Reddit). We classify patients as \texttt{easy}, \texttt{medium}, and \texttt{difficult}, three scales representing their resistance to recovery. We created an external environment by introducing stressors (e.g., social determinants of health) to simulate real-world situations. We dynamically inject clinically-grounded therapeutic strategies (motivational interview and cognitive behavioral therapy). Our evaluation, conducted by both human (blinded clinicians) and LLM-as-Judge, shows that ChatThero is superior in empathy and clinical relevance. We show that stressor simulation improves robustness of ChatThero. Explicit stressors increase relapse-like setbacks, matching real-world patterns. We evaluate ChatThero with behavioral change metrics. On a 1--5 scale, ChatThero raises \texttt{motivation} by points (from to ) and \texttt{confidence} by points (from to ), substantially outperforming GPT-5. On \texttt{difficult} patients, ChatThero reaches the success milestone with fewer turns than GPT-5.

Paper Structure

This paper contains 32 sections, 7 figures, 18 tables.

Figures (7)

  • Figure 1: Overview of ChatThero. Left. Multi-stage CBT with MI. The panel shows CBT stages and a parallel MI layer (Spirit; Four Phases—Engage, Focus, Evoke, Plan; OARS micro-skills; core scales and tools). The Therapy Agent picks from a small bank of CBT strategies and MI tools (0–10 scales, E–P–E, decisional balance, values card, agenda mapping) and then combines and reorders them based on the log and current stressors. Stage tags are for orientation purposes, not in a fixed order. The agent infers the current CBT stage and MI phase and sets the next goal. Center. A six-session course shows step-by-step progress. Each session card has a CBT Goal and an MI Focus that matches the phase (e.g., S1: Engage with scaling; S3: Evoke with double-sided reflections; S5: Plan with commitment and barrier planning). Between sessions, an Environment Agent writes stressors, and the Therapy Agent keeps memory and updates the plan. Right. Patient state and profile. Each AI patient starts with a structured Profile (traits, use history, barriers) and a dynamic Memory.
  • Figure 2: Data synthesis and two-stage training for ChatThero. Left: A Therapy Agent and a Patient Agent generate multi-session dialogues while an Environment Agent injects stressors, prompting strategy switches. Right: Stage-1 SFT teaches safe MI/CBT structure; Stage-2 DPO refines timing and strategy selection. At test time, the agent keeps memory across sessions, adapts after stressors, and targets higher motivation/confidence with lower time-to-success.
  • Figure 3: Single-session, no between-session stressors. Each point is the end-of-visit Motivation/Confidence mean (1--5). Colors denote models; shapes denote difficulty (Easy/Medium/Hard). ChatThero (based on Qwen-7B) scores highest across all difficulties, with the largest margins on Medium/Hard cases.
  • Figure 4: Six-visit episodes with between-session stressors. Grouped, stacked bars by difficulty (Easy/Medium/Hard). Within each visit, bars show ChatThero, GPT-4o, and GPT-4o-mini. Dark segment = start score; light segment = within-visit gain; numbers above bars = end score (1--5). ChatThero shows larger within-visit lifts and higher start scores over time, indicating better carryover under stress, especially on Medium/Hard patients.
  • Figure 5: single-session, per-difficulty analysis. X-axis = number of unique persuasive strategies used in a dialogue (from the predefined pool in Table \ref{['tab:predefined_strategies']}); Y-axis = resulting Motivation (1--5). Lines denote difficulty (solid/dashed/dotted). ChatThero shows a stronger positive relation in Medium/Hard cases, suggesting gains come from context-sensitive deployment rather than variety alone. Full information for both motivation and confidence can be found in Figure \ref{['fig:motivation+confidence_vs_strategy']}.
  • ...and 2 more figures