MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions
Aishik Mandal, Tanmoy Chakraborty, Iryna Gurevych
TL;DR
The paper addresses the private-data bottleneck in scaling psychological counseling by introducing MAGneT, a multi-agent framework that decomposes counselor response generation into specialized agents (reflection, questioning, solution provision, normalization, psycho-education) coordinated by a CBT-based planning agent and a turn-level technique selector. It couples this with a client-simulation component to generate realistic, privacy-preserving synthetic counseling sessions. A unified evaluation framework aggregates CTRS, PANAS, and WAI metrics and expands expert assessment to nine counseling dimensions, demonstrating MAGneT’s superiority in data diversity, quality, and downstream utility: experts preferred MAGneT sessions in 77.2% of cases, and automatic metrics showed notable gains over baselines; a Llama3-8B-Instruct model fine-tuned on MAGneT data further outperformed models trained on baseline data by approximately 6–7% on CTRS. The work provides open-source data and models, establishes a more rigorous, multi-faceted evaluation standard for synthetic counseling data, and discusses limitations such as session length, cultural/linguistic scope, and modality constraints, outlining paths toward more ecologically valid, multimodal, and longitudinal counseling research.
Abstract
The growing demand for scalable psychological counseling highlights the need for high-quality, privacy-compliant data, yet such data remains scarce. Here we introduce MAGneT, a novel multi-agent framework for synthetic psychological counseling session generation that decomposes counselor response generation into coordinated sub-tasks handled by specialized LLM agents, each modeling a key psychological technique. Unlike prior single-agent approaches, MAGneT better captures the structure and nuance of real counseling. We further propose a unified evaluation framework that consolidates diverse automatic metrics and expands expert assessment from four to nine counseling dimensions, thus addressing inconsistencies in prior evaluation protocols. Empirically, MAGneT substantially outperforms existing methods: experts prefer MAGneT-generated sessions in 77.2% of cases, and sessions generated by MAGneT yield 3.2% higher general counseling skills and 4.3% higher CBT-specific skills on cognitive therapy rating scale (CTRS). A open source Llama3-8B-Instruct model fine-tuned on MAGneT-generated data also outperforms models fine-tuned using baseline synthetic datasets by 6.9% on average on CTRS.We also make our code and data public.
