Table of Contents
Fetching ...

Evaluating GPT's Capability in Identifying Stages of Cognitive Impairment from Electronic Health Data

Yu Leng, Yingnan He, Colin Magdamo, Ana-Maria Vranceanu, Christine S. Ritchie, Shibani S. Mukerji, Lidia M. V. R. Moura, John R. Dickson, Deborah Blacker, Sudeshna Das

TL;DR

This work investigates GPT-4o's ability to identify stages of cognitive impairment from unstructured electronic health records. Using two real-world datasets—MGH memory-clinic notes for global CDR scoring and a 3-year Medicare note set for syndromic staging (NC/MCI/dementia)—the authors evaluate zero-shot GPT-4o with and without retrieval augmentation and prompt-engineering techniques. The model achieves high agreement with clinician labels, notably a weighted kappa of 0.83 on the memory-clinic task and 0.91 (0.96 for high-confidence cases) on the Medicare task, indicating strong potential for scalable chart reviews and clinical support. However, the study also highlights biases in documentation and access, underscoring the need for multi-institution validation and bias-mitigation strategies before deployment in routine care.

Abstract

Identifying cognitive impairment within electronic health records (EHRs) is crucial not only for timely diagnoses but also for facilitating research. Information about cognitive impairment often exists within unstructured clinician notes in EHRs, but manual chart reviews are both time-consuming and error-prone. To address this issue, our study evaluates an automated approach using zero-shot GPT-4o to determine stage of cognitive impairment in two different tasks. First, we evaluated the ability of GPT-4o to determine the global Clinical Dementia Rating (CDR) on specialist notes from 769 patients who visited the memory clinic at Massachusetts General Hospital (MGH), and achieved a weighted kappa score of 0.83. Second, we assessed GPT-4o's ability to differentiate between normal cognition, mild cognitive impairment (MCI), and dementia on all notes in a 3-year window from 860 Medicare patients. GPT-4o attained a weighted kappa score of 0.91 in comparison to specialist chart reviews and 0.96 on cases that the clinical adjudicators rated with high confidence. Our findings demonstrate GPT-4o's potential as a scalable chart review tool for creating research datasets and assisting diagnosis in clinical settings in the future.

Evaluating GPT's Capability in Identifying Stages of Cognitive Impairment from Electronic Health Data

TL;DR

This work investigates GPT-4o's ability to identify stages of cognitive impairment from unstructured electronic health records. Using two real-world datasets—MGH memory-clinic notes for global CDR scoring and a 3-year Medicare note set for syndromic staging (NC/MCI/dementia)—the authors evaluate zero-shot GPT-4o with and without retrieval augmentation and prompt-engineering techniques. The model achieves high agreement with clinician labels, notably a weighted kappa of 0.83 on the memory-clinic task and 0.91 (0.96 for high-confidence cases) on the Medicare task, indicating strong potential for scalable chart reviews and clinical support. However, the study also highlights biases in documentation and access, underscoring the need for multi-institution validation and bias-mitigation strategies before deployment in routine care.

Abstract

Identifying cognitive impairment within electronic health records (EHRs) is crucial not only for timely diagnoses but also for facilitating research. Information about cognitive impairment often exists within unstructured clinician notes in EHRs, but manual chart reviews are both time-consuming and error-prone. To address this issue, our study evaluates an automated approach using zero-shot GPT-4o to determine stage of cognitive impairment in two different tasks. First, we evaluated the ability of GPT-4o to determine the global Clinical Dementia Rating (CDR) on specialist notes from 769 patients who visited the memory clinic at Massachusetts General Hospital (MGH), and achieved a weighted kappa score of 0.83. Second, we assessed GPT-4o's ability to differentiate between normal cognition, mild cognitive impairment (MCI), and dementia on all notes in a 3-year window from 860 Medicare patients. GPT-4o attained a weighted kappa score of 0.91 in comparison to specialist chart reviews and 0.96 on cases that the clinical adjudicators rated with high confidence. Our findings demonstrate GPT-4o's potential as a scalable chart review tool for creating research datasets and assisting diagnosis in clinical settings in the future.

Paper Structure

This paper contains 10 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: GPT-4o Performance on Two Studies (Top Row: Study I; Bottom Row: Study II)
  • Figure :
  • Figure :