Table of Contents
Fetching ...

ClinicalMamba: A Generative Clinical Language Model on Longitudinal Clinical Notes

Zhichao Yang, Avijit Mitra, Sunjae Kwon, Hong Yu

TL;DR

ClinicalMamba addresses the challenge of modeling long-range information in clinical notes by extending context length to 16,000 tokens using a selective state-space mechanism within a Mamba-based architecture. Pretrained on longitudinal MIMIC-III notes, the 130M and 2.8B parameter variants demonstrate superior long-context information extraction, outperforming Mamba, clinical Llama, and zero-shot GPT-4 on tasks like cohort selection and ICD coding, while maintaining favorable perplexity-throughput trade-offs. The work introduces a prompt-based fine-tuning approach to enable few-shot adaptation and provides publicly released models to foster longitudinal clinical NLP research. Overall, the results suggest that long-context generative clinical LMs can achieve high accuracy with reduced compute, enabling scalable, longitudinal analysis of patient histories.

Abstract

The advancement of natural language processing (NLP) systems in healthcare hinges on language model ability to interpret the intricate information contained within clinical notes. This process often requires integrating information from various time points in a patient's medical history. However, most earlier clinical language models were pretrained with a context length limited to roughly one clinical document. In this study, We introduce ClinicalMamba, a specialized version of the Mamba language model, pretrained on a vast corpus of longitudinal clinical notes to address the unique linguistic characteristics and information processing needs of the medical domain. ClinicalMamba, with 130 million and 2.8 billion parameters, demonstrates a superior performance in modeling clinical language across extended text lengths compared to Mamba and clinical Llama. With few-shot learning, ClinicalMamba achieves notable benchmarks in speed and accuracy, outperforming existing clinical language models and general domain large models like GPT-4 in longitudinal clinical notes information extraction tasks.

ClinicalMamba: A Generative Clinical Language Model on Longitudinal Clinical Notes

TL;DR

ClinicalMamba addresses the challenge of modeling long-range information in clinical notes by extending context length to 16,000 tokens using a selective state-space mechanism within a Mamba-based architecture. Pretrained on longitudinal MIMIC-III notes, the 130M and 2.8B parameter variants demonstrate superior long-context information extraction, outperforming Mamba, clinical Llama, and zero-shot GPT-4 on tasks like cohort selection and ICD coding, while maintaining favorable perplexity-throughput trade-offs. The work introduces a prompt-based fine-tuning approach to enable few-shot adaptation and provides publicly released models to foster longitudinal clinical NLP research. Overall, the results suggest that long-context generative clinical LMs can achieve high accuracy with reduced compute, enabling scalable, longitudinal analysis of patient histories.

Abstract

The advancement of natural language processing (NLP) systems in healthcare hinges on language model ability to interpret the intricate information contained within clinical notes. This process often requires integrating information from various time points in a patient's medical history. However, most earlier clinical language models were pretrained with a context length limited to roughly one clinical document. In this study, We introduce ClinicalMamba, a specialized version of the Mamba language model, pretrained on a vast corpus of longitudinal clinical notes to address the unique linguistic characteristics and information processing needs of the medical domain. ClinicalMamba, with 130 million and 2.8 billion parameters, demonstrates a superior performance in modeling clinical language across extended text lengths compared to Mamba and clinical Llama. With few-shot learning, ClinicalMamba achieves notable benchmarks in speed and accuracy, outperforming existing clinical language models and general domain large models like GPT-4 in longitudinal clinical notes information extraction tasks.
Paper Structure (14 sections, 3 figures, 6 tables)

This paper contains 14 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Perplexity of different generative language models on MIMIC-III when evaluated at various preceding context lengths (1k, 4k, and 16k tokens). The X-axis is in the log scale. The subfigure is a zoom-out plot with perplexity ranges 0-100. Experiment settings and detailed results are in section \ref{['sec:results']}.
  • Figure 2: Illustration of Prompt-based fine-tuning.
  • Figure A.1: Long tail distribution of number of tokens per each visit. Y-axis is the density (sum to 1.0).