Table of Contents
Fetching ...

WiseMind: Recontextualizing AI with a Knowledge-Guided, Theory-Informed Multi-Agent Framework for Instrumental and Humanistic Benefits

Yuqi Wu, Guangya Wan, Jingjing Li, Shengming Zhao, Lingfeng Ma, Tianyi Ye, Ion Pop, Yanbo Zhang, Jie Chen

TL;DR

WiseMind presents a knowledge-guided, theory-informed multi-agent framework for psychiatric differential diagnosis that unifies a Structured Knowledge Graph derived from DSM-5 with a Reasonable Mind and an Emotional Mind to balance diagnostic reasoning and empathetic communication. It introduces a multi-faceted evaluation strategy encompassing simulated and human assessments plus ethical risk analysis, demonstrating up to 83.3% DDx accuracy for depression, 73.3% for bipolar, and 80% for anxiety, and improvements in empathy and clinician-facing realism compared with single-agent baselines. The work highlights the importance of deep contextualization across knowledge, process, and evaluation to translate benchmark NLP advances into clinically meaningful impact, while acknowledging limitations such as scope and reliance on curated graphs. If scaled and validated clinically, WiseMind could serve as a decision-support tool that enhances diagnostic accuracy, patient rapport, and safety in high-stakes mental health settings.

Abstract

Translating state-of-the-art NLP into practice often stalls at the "last mile" owing to insufficient contextualization of the target domain's knowledge, processes, and evaluation. Psychiatric differential diagnosis exemplifies this challenge: accurate assessments depend on nuanced clinical knowledge, a delicate cognitive-affective interview process, and downstream outcomes that extend far beyond benchmark accuracy. We present WiseMind, a systematic interdisciplinary contextualization framework that delivers both instrumental (diagnostic precision) and humanistic (empathy) gains. WiseMind comprises three components:(i) structured knowledge-guided proactive reasoning, which embeds DSM-5 criteria in a knowledge graph to steer questioning; (ii) a theory-informed dual-agent architecture that coordinates a "reasonable-mind" reasoning agent and an "emotional-mind" empathy agent, inspired by Dialectical Behavior Therapy; and (iii) a multi-faceted evaluation strategy covering simulated patients, user studies, clinician review, and ethical assessment. Tested on depression, anxiety, and bipolar disorder, WiseMind attains up to 84.2% diagnostic accuracy, which is comparable to human experts, while outperforming single-agent baselines in perceived empathy and trustworthiness. These results show that deep contextualization-across knowledge, process, and evaluation layers-can transform benchmark-driven NLP into clinically meaningful impact.

WiseMind: Recontextualizing AI with a Knowledge-Guided, Theory-Informed Multi-Agent Framework for Instrumental and Humanistic Benefits

TL;DR

WiseMind presents a knowledge-guided, theory-informed multi-agent framework for psychiatric differential diagnosis that unifies a Structured Knowledge Graph derived from DSM-5 with a Reasonable Mind and an Emotional Mind to balance diagnostic reasoning and empathetic communication. It introduces a multi-faceted evaluation strategy encompassing simulated and human assessments plus ethical risk analysis, demonstrating up to 83.3% DDx accuracy for depression, 73.3% for bipolar, and 80% for anxiety, and improvements in empathy and clinician-facing realism compared with single-agent baselines. The work highlights the importance of deep contextualization across knowledge, process, and evaluation to translate benchmark NLP advances into clinically meaningful impact, while acknowledging limitations such as scope and reliance on curated graphs. If scaled and validated clinically, WiseMind could serve as a decision-support tool that enhances diagnostic accuracy, patient rapport, and safety in high-stakes mental health settings.

Abstract

Translating state-of-the-art NLP into practice often stalls at the "last mile" owing to insufficient contextualization of the target domain's knowledge, processes, and evaluation. Psychiatric differential diagnosis exemplifies this challenge: accurate assessments depend on nuanced clinical knowledge, a delicate cognitive-affective interview process, and downstream outcomes that extend far beyond benchmark accuracy. We present WiseMind, a systematic interdisciplinary contextualization framework that delivers both instrumental (diagnostic precision) and humanistic (empathy) gains. WiseMind comprises three components:(i) structured knowledge-guided proactive reasoning, which embeds DSM-5 criteria in a knowledge graph to steer questioning; (ii) a theory-informed dual-agent architecture that coordinates a "reasonable-mind" reasoning agent and an "emotional-mind" empathy agent, inspired by Dialectical Behavior Therapy; and (iii) a multi-faceted evaluation strategy covering simulated patients, user studies, clinician review, and ethical assessment. Tested on depression, anxiety, and bipolar disorder, WiseMind attains up to 84.2% diagnostic accuracy, which is comparable to human experts, while outperforming single-agent baselines in perceived empathy and trustworthiness. These results show that deep contextualization-across knowledge, process, and evaluation layers-can transform benchmark-driven NLP into clinically meaningful impact.

Paper Structure

This paper contains 37 sections, 13 figures, 11 tables, 4 algorithms.

Figures (13)

  • Figure 1: The WiseMind Framework integrates three core components: a multi-agent reasoning workflow, a structured knowledge graph, and a multifaceted evaluation strategy. The system operates through coordinated action determination and question generation guided by the knowledge graph, while continuous evaluation across multiple dimensions ensures clinical effectiveness and ongoing refinement.
  • Figure 2: Three-tier Evaluation Framework for Psychiatric AI. Tier 1 uses AI patient simulation to assess diagnostic accuracy (CN-Recall, DDx-ACC). Tier 2 involves human patient actors to evaluate user experience (Help., Emp.). Tier 3 engages medical professionals to assess clinical validity (Spec., Prec.).
  • Figure 3: Practical Analysis (a) Benchmarking WiseMind framework with different base models. The trend implies that various base models have different strengths in terms of reasonable and emotional mind (b) performance comparison between same-base and mixed-base multi-agent configurations in the WiseMind framework for Closed-source LLMs; (c) for Open-source LLMs. In both settings, assigning different LLMs to the Reasonable Mind Agent (RA) and Emotional Mind Agent (EA)—based on task-specific strengths—significantly improves diagnostic accuracy, empathy, and medical realism over using the same model for both roles.
  • Figure 4: Example of DDx decision tree for depressed mood.
  • Figure 5: Structured knowledge graph for depressed mood.
  • ...and 8 more figures