Table of Contents
Fetching ...

Health AI Developer Foundations

Atilla P. Kiraly, Sebastien Baur, Kenneth Philbrick, Fereshteh Mahvar, Liron Yatziv, Tiffany Chen, Bram Sterling, Nick George, Fayaz Jamil, Jing Tang, Kai Bailey, Faruk Ahmed, Akshay Goel, Abbi Ward, Lin Yang, Andrew Sellergren, Yossi Matias, Avinatan Hassidim, Shravya Shetty, Daniel Golden, Shekoofeh Azizi, David F. Steiner, Yun Liu, Tim Thelin, Rory Pilgrim, Can Kirmizibayrak

TL;DR

While HAI-DEF and specifically the foundation models lower the barrier to entry for ML in healthcare, the importance of validation with problem- and population-specific data for each desired usage setting is emphasized.

Abstract

Robust medical Machine Learning (ML) models have the potential to revolutionize healthcare by accelerating clinical research, improving workflows and outcomes, and producing novel insights or capabilities. Developing such ML models from scratch is cost prohibitive and requires substantial compute, data, and time (e.g., expert labeling). To address these challenges, we introduce Health AI Developer Foundations (HAI-DEF), a suite of pre-trained, domain-specific foundation models, tools, and recipes to accelerate building ML for health applications. The models cover various modalities and domains, including radiology (X-rays and computed tomography), histopathology, dermatological imaging, and audio. These models provide domain specific embeddings that facilitate AI development with less labeled data, shorter training times, and reduced computational costs compared to traditional approaches. In addition, we utilize a common interface and style across these models, and prioritize usability to enable developers to integrate HAI-DEF efficiently. We present model evaluations across various tasks and conclude with a discussion of their application and evaluation, covering the importance of ensuring efficacy, fairness, and equity. Finally, while HAI-DEF and specifically the foundation models lower the barrier to entry for ML in healthcare, we emphasize the importance of validation with problem- and population-specific data for each desired usage setting. This technical report will be updated over time as more modalities and features are added.

Health AI Developer Foundations

TL;DR

While HAI-DEF and specifically the foundation models lower the barrier to entry for ML in healthcare, the importance of validation with problem- and population-specific data for each desired usage setting is emphasized.

Abstract

Robust medical Machine Learning (ML) models have the potential to revolutionize healthcare by accelerating clinical research, improving workflows and outcomes, and producing novel insights or capabilities. Developing such ML models from scratch is cost prohibitive and requires substantial compute, data, and time (e.g., expert labeling). To address these challenges, we introduce Health AI Developer Foundations (HAI-DEF), a suite of pre-trained, domain-specific foundation models, tools, and recipes to accelerate building ML for health applications. The models cover various modalities and domains, including radiology (X-rays and computed tomography), histopathology, dermatological imaging, and audio. These models provide domain specific embeddings that facilitate AI development with less labeled data, shorter training times, and reduced computational costs compared to traditional approaches. In addition, we utilize a common interface and style across these models, and prioritize usability to enable developers to integrate HAI-DEF efficiently. We present model evaluations across various tasks and conclude with a discussion of their application and evaluation, covering the importance of ensuring efficacy, fairness, and equity. Finally, while HAI-DEF and specifically the foundation models lower the barrier to entry for ML in healthcare, we emphasize the importance of validation with problem- and population-specific data for each desired usage setting. This technical report will be updated over time as more modalities and features are added.

Paper Structure

This paper contains 18 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Data efficiency comparison of four of our foundation models against established approaches. CXR (upper left) shows the average performance across six binary classification tasks with the original CXR Foundation model and the new ELIXR-B model; for Pathology (upper right), it focuses on a 4-class prostate cancer Gleason grading task for needle core biopsies and the detection of metastatic breast cancer in lymph nodes. For Derm (lower left), the evaluation centered on skin condition category (28-way) identification, while for HeAR (lower right), it involved identifying COVID-19 from cough sounds. Notably, the foundation models achieve comparable results while using substantially less training data and compute. CT Foundation is covered below; additional methods not available for equivalent comparisons.
  • Figure 2: Performance of ELIXR-C, ELIXR-B, and the original CXR Foundation embeddings for data-efficient classification. The ROC AUC results of a linear probe are shown averaged across 2 datasets (CheXpert and Chest X-ray14) for seven findings: atelectasis, cardiomegaly, airspace opacity, fracture, pneumothorax, consolidation, pleural effusion, and pulmonary edema. Both ELIXR-C and ELIXR-B demonstrate superior performance compared to the original CXR Foundation at matching dataset sizes.
  • Figure 3: Performance, measured in ROC AUC, of the Path Foundation on 11 histopathology classification tasks via linear probing. The Path Foundation embeddings demonstrate superior performance compared to ImageNet features across all tasks.
  • Figure 4: ROC AUC curves of data efficient classifiers using SCIN dataset for a representative sample of dermatology tasks. (top) shows results of a logistic regression classifier and (bottom) shows results using a simple (two-layer) neural network.
  • Figure 5: Comparison of the performance of HeAR to other audio encoders on four cough tasks, based on a single random distribution of train data. The y-axis shows the ROC AUC performance while the x-axis shows the percentage of training data used. HeAR performs favorably across different data regimes and tasks, demonstrating its data efficiency.
  • ...and 2 more figures