Table of Contents
Fetching ...

Large language models for mental health

Andreas Triantafyllopoulos, Yannik Terhorst, Iosif Tsangko, Florian B. Pokorny, Katrin D. Bartl-Pokorny, Lennart Seizer, Ayal Klein, Jenny Chim, Dana Atzil-Slonim, Maria Liakata, Markus Bühner, Johanna Löchner, Björn Schuller

TL;DR

A narrative review attempts to bridge the gap between the community developing large language models and the one which may benefit from them by providing intuitive explanations behind the basic concepts related to contemporary LLMs.

Abstract

Digital technologies have long been explored as a complement to standard procedure in mental health research and practice, ranging from the management of electronic health records to app-based interventions. The recent emergence of large language models (LLMs), both proprietary and open-source ones, represents a major new opportunity on that front. Yet there is still a divide between the community developing LLMs and the one which may benefit from them, thus hindering the beneficial translation of the technology into clinical use. This divide largely stems from the lack of a common language and understanding regarding the technology's inner workings, capabilities, and risks. Our narrative review attempts to bridge this gap by providing intuitive explanations behind the basic concepts related to contemporary LLMs.

Large language models for mental health

TL;DR

A narrative review attempts to bridge the gap between the community developing large language models and the one which may benefit from them by providing intuitive explanations behind the basic concepts related to contemporary LLMs.

Abstract

Digital technologies have long been explored as a complement to standard procedure in mental health research and practice, ranging from the management of electronic health records to app-based interventions. The recent emergence of large language models (LLMs), both proprietary and open-source ones, represents a major new opportunity on that front. Yet there is still a divide between the community developing LLMs and the one which may benefit from them, thus hindering the beneficial translation of the technology into clinical use. This divide largely stems from the lack of a common language and understanding regarding the technology's inner workings, capabilities, and risks. Our narrative review attempts to bridge this gap by providing intuitive explanations behind the basic concepts related to contemporary LLMs.

Paper Structure

This paper contains 23 sections, 5 figures.

Figures (5)

  • Figure 1: Overview of potential large language model (LLM) applications in mental health, loosely inspired by the review of Bendig22-TNG. Psychologists may use an LLM to offload mundane tasks, such as note or report summarisation, or to obtain feedback regarding a particular therapy plan or even a single therapy session. Further, they might use LLM to simulate patient responses for the purposes of training. With respect to patients, LLM can interact with them in different degrees of autonomy, ranging all the way from a fully-fledged, psychotherapy session, to mere reminders to the patient regarding medication. We note that some of those tasks can be achieved using more traditional software. However, LLM provide both new avenues for improving efficiency, as well as offering a uniform platform that can achieve multiple tasks concurrently.
  • Figure 2: Example of a hypothetical LLM response to a patient's intent to quit their antidepressant medication. In bolder font, we show alternative instructions given to the model, which condition the type of its response. These instructions can be used to guide the LLM towards particular types of responses that are conducive to a specific therapeutic plan.
  • Figure 3: Typical workflow for LLM model developers (left) and proposed workflow for psychologists who wish to adopt LLM in their practice (right). Developers follow a multi-stage training process (see \ref{['sec:train']}) to prepare their LLM, and often continue to monitor and improve them even after deployment. Psychologists should carefully select the use case they want to automate using LLM, conduct a preliminary evaluation of alternative models, and establish rigorous criteria for monitoring the effectiveness of LLM with respect to patient outcomes. Moreover, the more technically versed can optionally finetune models on their own data, a process which can improve performance.
  • Figure 4: Overview of autoregressive output generation by an LLM. The left-side panel shows a schematic of the major steps in an LLM workflow: a) the input text is split into words or sub-words (see \ref{['sec:tokens']}) and subsequently converted to a numeric representation; b) this numeric representation becomes the input to the LLM; c) the LLM outputs the most likely next word in the sequence (as a numeric representation that can be interpreted as a word). The right-side panel shows the initial step of this autoregressive process for generating one of the responses shown in \ref{['fig:example']}. Note that the output word at each step becomes part of the input for the next step (hence the name autoregressive).
  • Figure 5: Overview of the inner workings of the transformer architecture which underpins most contemporary LLM. The leftmost panel shows the inner workings of the attention layer, a key component of the transformer architecture. The initial numerical representations of words (green table) are first linearly transformed into a specific contextualised representation (orange table), which is then used to compute similarities across all words in the input (resulting in the purple matrix). The linear transformation is learnt from data and is used to specify different notions of similarity (e. g., grammatic vs affective similarity; see text for more details). This is the first stage of the attention layer. The middle panel shows the second stage of the attention layer, which computes a weighted representation from each word, derived from its (contextualised) similarity with all other words in the input. In a nutshell, this transfers information across all words in a sentence. In our example, this could be used to identify that the verb taking corresponds to my antidepressants; the representation of taking would then be enhanced by the representation of my antidepressants, thus placing this word in the correct context for this sentence; given that several of those layers are applied one after another, a later layer would then connect stop with taking my antidepressants, to finally identify the intent of the patient. Finally, the rightmost panel shows the workings of the multilayered perceptron which processes the output of each attention layer; this further contextualises the words in the sentence with information learnt during training. In our example, this could be used to identify that the phrase stop taking my antidepressants is associated with negative connotations.