High Throughput Phenotyping of Physician Notes with Large Language and Hybrid NLP Models
Syed I. Munzir, Daniel B. Hier, Michael D. Carrithers
TL;DR
The paper addresses the need for high-throughput deep phenotyping of physician notes in electronic health records to support precision medicine. It compares NimbleMiner, a hybrid NLP method using word embeddings and a support vector machine, with GPT-4, a general-purpose large language model, for phenotyping 547 multiple sclerosis notes across 19 neurological phenotype categories, with ground-truth labels from Prodigy. Both approaches achieve high accuracy (0.87 for NimbleMiner and 0.85 for GPT-4), nearing the human inter-annotator agreement ceiling (κ ≈ 0.90), with GPT-4 offering easy configuration and no training data, and NimbleMiner providing transparency and fast recall with proper lexicon design. The findings suggest LLMs may become the dominant method for high-throughput deep phenotyping in clinical notes, though broader validation on diverse corpora is needed to assess generalizability and computational costs.
Abstract
Deep phenotyping is the detailed description of patient signs and symptoms using concepts from an ontology. The deep phenotyping of the numerous physician notes in electronic health records requires high throughput methods. Over the past thirty years, progress toward making high throughput phenotyping feasible. In this study, we demonstrate that a large language model and a hybrid NLP model (combining word vectors with a machine learning classifier) can perform high throughput phenotyping on physician notes with high accuracy. Large language models will likely emerge as the preferred method for high throughput deep phenotyping of physician notes.
