Large Language Models for Interpretable Mental Health Diagnosis
Brian Hyeongseok Kim, Chao Wang
TL;DR
The paper addresses the challenge of accurate and interpretable mental health diagnosis given lengthy diagnostic manuals. It proposes a hybrid pipeline where an LLM translates narrative diagnostic criteria into a Datalog rule set that a constraint logic programming engine executes to yield a diagnosis, with expert verification to ensure fidelity. Results show that expert-corrected logic programs achieve perfect diagnostic accuracy on a 30-patient set, while LLM-only approaches exhibit variability and interpretability issues. The work demonstrates a practical path toward explainable, auditable clinical decision support in psychiatry and motivates further real-world evaluation and refinement of the methodology.
Abstract
We propose a clinical decision support system (CDSS) for mental health diagnosis that combines the strengths of large language models (LLMs) and constraint logic programming (CLP). Having a CDSS is important because of the high complexity of diagnostic manuals used by mental health professionals and the danger of diagnostic errors. Our CDSS is a software tool that uses an LLM to translate diagnostic manuals to a logic program and solves the program using an off-the-shelf CLP engine to query a patient's diagnosis based on the encoded rules and provided data. By giving domain experts the opportunity to inspect the LLM-generated logic program, and making modifications when needed, our CDSS ensures that the diagnosis is not only accurate but also interpretable. We experimentally compare it with two baseline approaches of using LLMs: diagnosing patients using the LLM-only approach, and using the LLM-generated logic program but without expert inspection. The results show that, while LLMs are extremely useful in generating candidate logic programs, these programs still require expert inspection and modification to guarantee faithfulness to the official diagnostic manuals. Additionally, ethical concerns arise from the direct use of patient data in LLMs, underscoring the need for a safer hybrid approach like our proposed method.
