Table of Contents
Fetching ...

Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

Ankit Aich, Avery Quynh, Pamela Osseyi, Amy Pinkham, Philip Harvey, Brenda Curtis, Colin Depp, Natalie Parde

TL;DR

This work demonstrates that fine-tuned Seq2Seq language models can effectively assist in both collecting clinically enriched data and annotating it for domain-specific variables in bipolar disorder and schizophrenia. By pairing a context-aware interviewer with a dedicated annotation model, the authors build a scalable pipeline that outperforms large commercial LLMs on domain tasks and maintains high inter-annotator reliability without making diagnostic claims. A chained pipeline further shows end-to-end viability for data collection and scoring with minimal performance loss. The study emphasizes practical utility, ethical safeguards, and potential for broader adoption in clinical research, while acknowledging limitations such as sample size and modality scope.

Abstract

NLP in mental health has been primarily social media focused. Real world practitioners also have high case loads and often domain specific variables, of which modern LLMs lack context. We take a dataset made by recruiting 644 participants, including individuals diagnosed with Bipolar Disorder (BD), Schizophrenia (SZ), and Healthy Controls (HC). Participants undertook tasks derived from a standardized mental health instrument, and the resulting data were transcribed and annotated by experts across five clinical variables. This paper demonstrates the application of contemporary language models in sequence-to-sequence tasks to enhance mental health research. Specifically, we illustrate how these models can facilitate the deployment of mental health instruments, data collection, and data annotation with high accuracy and scalability. We show that small models are capable of annotation for domain-specific clinical variables, data collection for mental-health instruments, and perform better then commercial large models.

Using LLMs to Aid Annotation and Collection of Clinically-Enriched Data in Bipolar Disorder and Schizophrenia

TL;DR

This work demonstrates that fine-tuned Seq2Seq language models can effectively assist in both collecting clinically enriched data and annotating it for domain-specific variables in bipolar disorder and schizophrenia. By pairing a context-aware interviewer with a dedicated annotation model, the authors build a scalable pipeline that outperforms large commercial LLMs on domain tasks and maintains high inter-annotator reliability without making diagnostic claims. A chained pipeline further shows end-to-end viability for data collection and scoring with minimal performance loss. The study emphasizes practical utility, ethical safeguards, and potential for broader adoption in clinical research, while acknowledging limitations such as sample size and modality scope.

Abstract

NLP in mental health has been primarily social media focused. Real world practitioners also have high case loads and often domain specific variables, of which modern LLMs lack context. We take a dataset made by recruiting 644 participants, including individuals diagnosed with Bipolar Disorder (BD), Schizophrenia (SZ), and Healthy Controls (HC). Participants undertook tasks derived from a standardized mental health instrument, and the resulting data were transcribed and annotated by experts across five clinical variables. This paper demonstrates the application of contemporary language models in sequence-to-sequence tasks to enhance mental health research. Specifically, we illustrate how these models can facilitate the deployment of mental health instruments, data collection, and data annotation with high accuracy and scalability. We show that small models are capable of annotation for domain-specific clinical variables, data collection for mental-health instruments, and perform better then commercial large models.
Paper Structure (23 sections, 1 equation, 3 figures, 6 tables)

This paper contains 23 sections, 1 equation, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Our method creates a fine-tuned model. This model is able to directly interact with recruited participants to help them undertake established mental health instruments through turn-based tasks. It can annotate for clinical variables with low error. We see that commercial LLMs like GPT-4 / GPT-4o cannot annotate when it comes to clinical variables which are niche to a domain.
  • Figure 2: Interview model turns and dialogue history to calculate reconstruction loss and generate well aligned sequences towards the SSPA
  • Figure 3: Chained Model Setup. Two standalone t5 models are chained by output and input. The Interview generator model works with patient dialogues to create LLM generated transcripts. This is fed into the score prediction model which outputs low error scores for the SSPA using a cross-entropy loss function.Picture resized for space limitations. Please zoom-in while reading review version.