A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes

Syed I. Munzir; Daniel B. Hier; Chelsea Oommen; Michael D. Carrithers

A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes

Syed I. Munzir, Daniel B. Hier, Chelsea Oommen, Michael D. Carrithers

TL;DR

This study compares three computational approaches to high-throughput phenotyping: a large language model (LLM) incorporating generative AI, a deep learning approach utilizing span categorization, and a machine learning (ML) approach with word embeddings.

Abstract

High-throughput phenotyping, the automated mapping of patient signs and symptoms to standardized ontology concepts, is essential to gaining value from electronic health records (EHR) in the support of precision medicine. Despite technological advances, high-throughput phenotyping remains a challenge. This study compares three computational approaches to high-throughput phenotyping: a Large Language Model (LLM) incorporating generative AI, a Natural Language Processing (NLP) approach utilizing deep learning for span categorization, and a hybrid approach combining word vectors with machine learning. The approach that implemented GPT-4 (a Large Language Model) demonstrated superior performance, suggesting that Large Language Models are poised to be the preferred method for high-throughput phenotyping of physician notes.

A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes

TL;DR

Abstract

Paper Structure (5 sections, 5 figures)

This paper contains 5 sections, 5 figures.

Abstract
Introduction
Methods
Results
Discussion

Figures (5)

Figure 1: Annotations screens for Prodigy for text spans indicating weakness. The annotator has a choice of 20 labels for selected text spans.
Figure 2: Due to class imbalance in the training dataset for the NLP spancat model, synthetic data was added, increasing the number of lines annotated from 11,688 to 15,052 and thus increasing the minority classes ON (optic neuritis), seizure, sleep, and tremor. In addition, additional training examples were added to the hyperreflexia, hyporeflexia, and weakness classes due to low recall in these classes (see discussion)
Figure 3: Examples of seed terms used to generate simclins (a) and examples of positive and negated text spans for phenotype identified by NimbleMiner (b).
Figure 4: For physician note, GPT-4 outputted a list of phenotypes (a) and explanations for its choices (b).
Figure 5: Heat map showing precision, recall, and accuracy for three high-throughput phenotyping approaches: Hybrid, NLP, and LLM. Individual phenotype category metrics are micro averages; the overall metrics are macro averages. Abbreviations include CN (cranial nerve and brainstem), EOM (extraocular eye movements), and ON (optic neuritis). The category paresthesias includes sensory loss, numbness, and tingling)

A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes

TL;DR

Abstract

A Large Language Model Outperforms Other Computational Approaches to the High-Throughput Phenotyping of Physician Notes

Authors

TL;DR

Abstract

Table of Contents

Figures (5)