Table of Contents
Fetching ...

Detecting the Clinical Features of Difficult-to-Treat Depression using Synthetic Data from Large Language Models

Isabelle Lorge, Dan W. Joyce, Niall Taylor, Alejo Nevado-Holgado, Andrea Cipriani, Andrey Kormilitzin

TL;DR

This work demonstrates that a span-extraction model trained exclusively on LLM-generated synthetic narrative clinical notes can identify prognostic factors for difficult-to-treat depression in real EHR data. By operationalizing the DTD phenotype into abductively annotated PATIENT/ILLNESS/TREATMENT factors and comparing token-, span-, and sentence-level BERT-based approaches, the study achieves strong performance on a subset of clinically important factors (e.g., abuse, suicidality, family history) with high precision when using a conservative confidence threshold. The results suggest a viable, privacy-preserving route to clinical decision support in psychiatry, while highlighting challenges such as negation handling, label noise in synthetic data, and generalization to negative classes. Future work could further improve robustness with domain-pretrained models and more advanced synthetic-data generation strategies, enabling broader deployment across phenotypes and healthcare settings.

Abstract

Difficult-to-treat depression (DTD) has been proposed as a broader and more clinically comprehensive perspective on a person's depressive disorder where despite treatment, they continue to experience significant burden. We sought to develop a Large Language Model (LLM)-based tool capable of interrogating routinely-collected, narrative (free-text) electronic health record (EHR) data to locate published prognostic factors that capture the clinical syndrome of DTD. In this work, we use LLM-generated synthetic data (GPT3.5) and a Non-Maximum Suppression (NMS) algorithm to train a BERT-based span extraction model. The resulting model is then able to extract and label spans related to a variety of relevant positive and negative factors in real clinical data (i.e. spans of text that increase or decrease the likelihood of a patient matching the DTD syndrome). We show it is possible to obtain good overall performance (0.70 F1 across polarity) on real clinical data on a set of as many as 20 different factors, and high performance (0.85 F1 with 0.95 precision) on a subset of important DTD factors such as history of abuse, family history of affective disorder, illness severity and suicidality by training the model exclusively on synthetic data. Our results show promise for future healthcare applications especially in applications where traditionally, highly confidential medical data and human-expert annotation would normally be required.

Detecting the Clinical Features of Difficult-to-Treat Depression using Synthetic Data from Large Language Models

TL;DR

This work demonstrates that a span-extraction model trained exclusively on LLM-generated synthetic narrative clinical notes can identify prognostic factors for difficult-to-treat depression in real EHR data. By operationalizing the DTD phenotype into abductively annotated PATIENT/ILLNESS/TREATMENT factors and comparing token-, span-, and sentence-level BERT-based approaches, the study achieves strong performance on a subset of clinically important factors (e.g., abuse, suicidality, family history) with high precision when using a conservative confidence threshold. The results suggest a viable, privacy-preserving route to clinical decision support in psychiatry, while highlighting challenges such as negation handling, label noise in synthetic data, and generalization to negative classes. Future work could further improve robustness with domain-pretrained models and more advanced synthetic-data generation strategies, enabling broader deployment across phenotypes and healthcare settings.

Abstract

Difficult-to-treat depression (DTD) has been proposed as a broader and more clinically comprehensive perspective on a person's depressive disorder where despite treatment, they continue to experience significant burden. We sought to develop a Large Language Model (LLM)-based tool capable of interrogating routinely-collected, narrative (free-text) electronic health record (EHR) data to locate published prognostic factors that capture the clinical syndrome of DTD. In this work, we use LLM-generated synthetic data (GPT3.5) and a Non-Maximum Suppression (NMS) algorithm to train a BERT-based span extraction model. The resulting model is then able to extract and label spans related to a variety of relevant positive and negative factors in real clinical data (i.e. spans of text that increase or decrease the likelihood of a patient matching the DTD syndrome). We show it is possible to obtain good overall performance (0.70 F1 across polarity) on real clinical data on a set of as many as 20 different factors, and high performance (0.85 F1 with 0.95 precision) on a subset of important DTD factors such as history of abuse, family history of affective disorder, illness severity and suicidality by training the model exclusively on synthetic data. Our results show promise for future healthcare applications especially in applications where traditionally, highly confidential medical data and human-expert annotation would normally be required.
Paper Structure (28 sections, 1 equation, 4 figures, 9 tables)

This paper contains 28 sections, 1 equation, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Span-level model architecture.
  • Figure 2: Log-scaled confusion matrix (synthetic data).
  • Figure 3: Average explicit mentions of all label words.
  • Figure 4: Average Jaccard similarities of span pairs.