Labrador: Exploring the Limits of Masked Language Modeling for Laboratory Data
David R. Bellamy, Bhawesh Kumar, Cindy Wang, Andrew Beam
TL;DR
Labrador introduces a continuous Transformer architecture pretrained on a large corpus of lab measurements to learn representations from numeric EHR data. Despite strong intrinsic pre-training performance and effective lab value imputation, transfer learning to downstream clinical tasks yields limited gains, with XGBoost often outperforming the transformers. The study finds Labrador generally outperforms a BERT baseline but still struggles to surpass traditional tree-based methods, highlighting data-scale and data-generating-process limitations. The authors advocate multimodal, multimethod modeling and larger, harmonized datasets to realize the potential of foundation models for numerical EHR data.
Abstract
In this work we introduce Labrador, a pre-trained Transformer model for laboratory data. Labrador and BERT were pre-trained on a corpus of 100 million lab test results from electronic health records (EHRs) and evaluated on various downstream outcome prediction tasks. Both models demonstrate mastery of the pre-training task but neither consistently outperform XGBoost on downstream supervised tasks. Our ablation studies reveal that transfer learning shows limited effectiveness for BERT and achieves marginal success with Labrador. We explore the reasons for the failure of transfer learning and suggest that the data generating process underlying each patient cannot be characterized sufficiently using labs alone, among other factors. We encourage future work to focus on joint modeling of multiple EHR data categories and to include tree-based baselines in their evaluations.
