Table of Contents
Fetching ...

LLM-as-a-Supervisor: Mistaken Therapeutic Behaviors Trigger Targeted Supervisory Feedback

Chen Xu, Zhenyu Lv, Tian Lan, Xianyang Wang, Luyao Ji, Leyang Cui, Minqiang Yang, Jian Shen, Qunxi Dong, Xiuling Liu, Juan Wang, Bin Hu

TL;DR

The paper tackles scalable therapist training by introducing LLM-as-a-Supervisor, a framework that uses universal mistaken behaviors to generate clear, actionable feedback. It presents Mate, a mistake-driven, multi-agent data synthesis pipeline with Validator-Guided Refinement to ensure high-quality supervision data. Fine-tuning open-source models on Mate yields significant gains in mistake localization, category classification, and supervisory feedback quality, with transferable improvements to empathy classification and novice therapist self-efficacy. The results show that lightweight models can achieve professional supervisory capabilities, offering a scalable path for AI-assisted psychotherapy training.

Abstract

Although large language models (LLMs) hold significant promise in psychotherapy, their direct application in patient-facing scenarios raises ethical and safety concerns. Therefore, this work shifts towards developing an LLM as a supervisor to train real therapists. In addition to the privacy of clinical therapist training data, a fundamental contradiction complicates the training of therapeutic behaviors: clear feedback standards are necessary to ensure a controlled training system, yet there is no absolute "gold standard" for appropriate therapeutic behaviors in practice. In contrast, many common therapeutic mistakes are universal and identifiable, making them effective triggers for targeted feedback that can serve as clearer evidence. Motivated by this, we create a novel therapist-training paradigm: (1) guidelines for mistaken behaviors and targeted correction strategies are first established as standards; (2) a human-in-the-loop dialogue-feedback dataset is then constructed, where a mistake-prone agent intentionally makes standard mistakes during interviews naturally, and a supervisor agent locates and identifies mistakes and provides targeted feedback; (3) after fine-tuning on this dataset, the final supervisor model is provided for real therapist training. The detailed experimental results of automated, human and downstream assessments demonstrate that models fine-tuned on our dataset MATE, can provide high-quality feedback according to the clinical guideline, showing significant potential for the therapist training scenario.

LLM-as-a-Supervisor: Mistaken Therapeutic Behaviors Trigger Targeted Supervisory Feedback

TL;DR

The paper tackles scalable therapist training by introducing LLM-as-a-Supervisor, a framework that uses universal mistaken behaviors to generate clear, actionable feedback. It presents Mate, a mistake-driven, multi-agent data synthesis pipeline with Validator-Guided Refinement to ensure high-quality supervision data. Fine-tuning open-source models on Mate yields significant gains in mistake localization, category classification, and supervisory feedback quality, with transferable improvements to empathy classification and novice therapist self-efficacy. The results show that lightweight models can achieve professional supervisory capabilities, offering a scalable path for AI-assisted psychotherapy training.

Abstract

Although large language models (LLMs) hold significant promise in psychotherapy, their direct application in patient-facing scenarios raises ethical and safety concerns. Therefore, this work shifts towards developing an LLM as a supervisor to train real therapists. In addition to the privacy of clinical therapist training data, a fundamental contradiction complicates the training of therapeutic behaviors: clear feedback standards are necessary to ensure a controlled training system, yet there is no absolute "gold standard" for appropriate therapeutic behaviors in practice. In contrast, many common therapeutic mistakes are universal and identifiable, making them effective triggers for targeted feedback that can serve as clearer evidence. Motivated by this, we create a novel therapist-training paradigm: (1) guidelines for mistaken behaviors and targeted correction strategies are first established as standards; (2) a human-in-the-loop dialogue-feedback dataset is then constructed, where a mistake-prone agent intentionally makes standard mistakes during interviews naturally, and a supervisor agent locates and identifies mistakes and provides targeted feedback; (3) after fine-tuning on this dataset, the final supervisor model is provided for real therapist training. The detailed experimental results of automated, human and downstream assessments demonstrate that models fine-tuned on our dataset MATE, can provide high-quality feedback according to the clinical guideline, showing significant potential for the therapist training scenario.

Paper Structure

This paper contains 30 sections, 7 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: A real example from the LLM Supervisor trained on the proposed Mate dataset, where the model draws on a guideline (mistake-correction pairs shown in Figure \ref{['fig:method']}a) as evidence to highlight mistaken utterances, categorize mistaken behaviors, and provide targeted corrective feedback.
  • Figure 2: Overview of our human-in-the-loop dataset construction pipeline. A multi-agent framework (e) collaboratively generates dialogue-feedback data through: (b) a therapist agent that naturally integrates predefined therapeutic mistakes into conversations, (c) a patient agent that authentically reacts based on its underlying cognitive model, and (d) a supervisor agent that observes therapist-patient interactions to provide targeted feedback. Clinical therapists and supervisors participate in both (a) guideline creation to ensure therapist training objectives and (f) instance review to ensure overall dataset quality.
  • Figure 3: Therapist Training System. A web platform built on Mate with: (1) simulated clients, (2) a dialogue interface, and (3) an LLM-as-Supervisor that detects and localizes mistaken therapeutic behaviors (e.g., forcing change) and provides targeted feedback.
  • Figure 4: LLM-as-a-judge evaluation results comparing critiques generated by the Qwen3-8B fine-tuned with Mate dataset (Win) against the base model without fine-tuning (Loss). The chart shows the win, loss, and tie rates across five professional criteria.
  • Figure 5: High agreement between LLM-as-a-Judge and human evaluation in supervisory feedback assessment.
  • ...and 1 more figures