Table of Contents
Fetching ...

Can Large Language Models be Used to Provide Psychological Counselling? An Analysis of GPT-4-Generated Responses Using Role-play Dialogues

Michimasa Inaba, Mariko Ukiyo, Keiko Takamizo

TL;DR

This paper investigates whether large language models can provide psychological counselling in text-based, multiturn dialogues. It builds a GPT-4–driven counsellor by collecting role-play data with expert therapists, annotating counselor intents, and prompting GPT-4 with these annotations and prior dialogue. Third-party professionals evaluated GPT-4 outputs against human counsellors in identical contexts, finding GPT-4 responses largely competitive and with no significant quality gap. The results suggest potential for LLM-based counselling in practice while highlighting the necessity for careful prompting, safety measures, and further validation of fully automated systems in real-world settings.

Abstract

Mental health care poses an increasingly serious challenge to modern societies. In this context, there has been a surge in research that utilizes information technologies to address mental health problems, including those aiming to develop counseling dialogue systems. However, there is a need for more evaluations of the performance of counseling dialogue systems that use large language models. For this study, we collected counseling dialogue data via role-playing scenarios involving expert counselors, and the utterances were annotated with the intentions of the counselors. To determine the feasibility of a dialogue system in real-world counseling scenarios, third-party counselors evaluated the appropriateness of responses from human counselors and those generated by GPT-4 in identical contexts in role-play dialogue data. Analysis of the evaluation results showed that the responses generated by GPT-4 were competitive with those of human counselors.

Can Large Language Models be Used to Provide Psychological Counselling? An Analysis of GPT-4-Generated Responses Using Role-play Dialogues

TL;DR

This paper investigates whether large language models can provide psychological counselling in text-based, multiturn dialogues. It builds a GPT-4–driven counsellor by collecting role-play data with expert therapists, annotating counselor intents, and prompting GPT-4 with these annotations and prior dialogue. Third-party professionals evaluated GPT-4 outputs against human counsellors in identical contexts, finding GPT-4 responses largely competitive and with no significant quality gap. The results suggest potential for LLM-based counselling in practice while highlighting the necessity for careful prompting, safety measures, and further validation of fully automated systems in real-world settings.

Abstract

Mental health care poses an increasingly serious challenge to modern societies. In this context, there has been a surge in research that utilizes information technologies to address mental health problems, including those aiming to develop counseling dialogue systems. However, there is a need for more evaluations of the performance of counseling dialogue systems that use large language models. For this study, we collected counseling dialogue data via role-playing scenarios involving expert counselors, and the utterances were annotated with the intentions of the counselors. To determine the feasibility of a dialogue system in real-world counseling scenarios, third-party counselors evaluated the appropriateness of responses from human counselors and those generated by GPT-4 in identical contexts in role-play dialogue data. Analysis of the evaluation results showed that the responses generated by GPT-4 were competitive with those of human counselors.
Paper Structure (7 sections, 1 figure, 9 tables)

This paper contains 7 sections, 1 figure, 9 tables.

Figures (1)

  • Figure 1: Ratio of rating scores assigned to utterances