Table of Contents
Fetching ...

PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation

Jinpeng Hu, Tengteng Dong, Luo Gang, Hui Ma, Peng Zou, Xiao Sun, Dan Guo, Xun Yang, Meng Wang

TL;DR

The paper addresses the need for domain-specific, knowledge-grounded LLMs capable of psychological understanding and evaluation by proposing PsycoLLM, a psychology-focused model trained on a high-quality dataset comprising single-turn QA, multi-turn dialogues, and knowledge-based QA. It introduces a three-stage data-creation pipeline for multi-turn dialogues and a teacher-student framework for knowledge-based QA, along with a benchmark based on authoritative Chinese psychological counseling examinations to assess ethics, theory, and case analysis. Empirical results show PsycoLLM achieving superior performance on the benchmark compared with a wide range of baselines, while maintaining general reasoning ability to a reasonable extent. The work provides a practical pathway for safer, more reliable mental health support via LLMs and highlights future directions such as multimodal data and bias mitigation.

Abstract

Mental health has attracted substantial attention in recent years and LLM can be an effective technology for alleviating this problem owing to its capability in text understanding and dialogue. However, existing research in this domain often suffers from limitations, such as training on datasets lacking crucial prior knowledge and evidence, and the absence of comprehensive evaluation methods. In this paper, we propose a specialized psychological large language model (LLM), named PsycoLLM, trained on a proposed high-quality psychological dataset, including single-turn QA, multi-turn dialogues and knowledge-based QA. Specifically, we construct multi-turn dialogues through a three-step pipeline comprising multi-turn QA generation, evidence judgment, and dialogue refinement. We augment this process with real-world psychological case backgrounds extracted from online platforms, enhancing the relevance and applicability of the generated data. Additionally, to compare the performance of PsycoLLM with other LLMs, we develop a comprehensive psychological benchmark based on authoritative psychological counseling examinations in China, which includes assessments of professional ethics, theoretical proficiency, and case analysis. The experimental results on the benchmark illustrate the effectiveness of PsycoLLM, which demonstrates superior performance compared to other LLMs.

PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation

TL;DR

The paper addresses the need for domain-specific, knowledge-grounded LLMs capable of psychological understanding and evaluation by proposing PsycoLLM, a psychology-focused model trained on a high-quality dataset comprising single-turn QA, multi-turn dialogues, and knowledge-based QA. It introduces a three-stage data-creation pipeline for multi-turn dialogues and a teacher-student framework for knowledge-based QA, along with a benchmark based on authoritative Chinese psychological counseling examinations to assess ethics, theory, and case analysis. Empirical results show PsycoLLM achieving superior performance on the benchmark compared with a wide range of baselines, while maintaining general reasoning ability to a reasonable extent. The work provides a practical pathway for safer, more reliable mental health support via LLMs and highlights future directions such as multimodal data and bias mitigation.

Abstract

Mental health has attracted substantial attention in recent years and LLM can be an effective technology for alleviating this problem owing to its capability in text understanding and dialogue. However, existing research in this domain often suffers from limitations, such as training on datasets lacking crucial prior knowledge and evidence, and the absence of comprehensive evaluation methods. In this paper, we propose a specialized psychological large language model (LLM), named PsycoLLM, trained on a proposed high-quality psychological dataset, including single-turn QA, multi-turn dialogues and knowledge-based QA. Specifically, we construct multi-turn dialogues through a three-step pipeline comprising multi-turn QA generation, evidence judgment, and dialogue refinement. We augment this process with real-world psychological case backgrounds extracted from online platforms, enhancing the relevance and applicability of the generated data. Additionally, to compare the performance of PsycoLLM with other LLMs, we develop a comprehensive psychological benchmark based on authoritative psychological counseling examinations in China, which includes assessments of professional ethics, theoretical proficiency, and case analysis. The experimental results on the benchmark illustrate the effectiveness of PsycoLLM, which demonstrates superior performance compared to other LLMs.
Paper Structure (23 sections, 3 equations, 6 figures, 7 tables)

This paper contains 23 sections, 3 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Overview of dataset preparation, including single-turn QA, multi-turn dialogue, and knowledge QA generation.
  • Figure 2: The topic distribution. We divide the data into 9 topics and report their percentages.
  • Figure 3: Word cloud map of psychological consultants' response in single-turn QA dataset.
  • Figure 4: Examples of the generated multi-turn dialogue data: Step 1 involves data generation, Step 2 focuses on evidence judgment and integration, and Step 3 entails revision for aspects such as empathy.
  • Figure 5: Examples of MCQs in the proposed benchmark, including SMCQ, MMCQ and case-based MCQs, respectively.
  • ...and 1 more figures