Table of Contents
Fetching ...

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, Jayne Seekins, David A. Mong, Safwan S. Halabi, Jesse K. Sandberg, Ricky Jones, David B. Larson, Curtis P. Langlotz, Bhavik N. Patel, Matthew P. Lungren, Andrew Y. Ng

TL;DR

CheXpert addresses the need for large, richly labeled chest radiograph data with uncertainty annotations and robust reference standards. It introduces a rule-based labeler to extract 14 observations from radiology reports, captures uncertainty through multiple training strategies, and validates performance against radiologists on a carefully annotated test set. The study demonstrates that uncertainty-aware approaches can yield superior discrimination for several pathologies and provides a public dataset with strong ground truth to drive future chest radiograph interpretation research. Overall, CheXpert offers a scalable benchmark and methodological framework for uncertain-label learning in medical imaging with demonstrated radiologist-level comparisons for key findings.

Abstract

Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models. The dataset is freely available at https://stanfordmlgroup.github.io/competitions/chexpert .

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

TL;DR

CheXpert addresses the need for large, richly labeled chest radiograph data with uncertainty annotations and robust reference standards. It introduces a rule-based labeler to extract 14 observations from radiology reports, captures uncertainty through multiple training strategies, and validates performance against radiologists on a carefully annotated test set. The study demonstrates that uncertainty-aware approaches can yield superior discrimination for several pathologies and provides a public dataset with strong ground truth to drive future chest radiograph interpretation research. Overall, CheXpert offers a scalable benchmark and methodological framework for uncertain-label learning in medical imaging with demonstrated radiologist-level comparisons for key findings.

Abstract

Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models. The dataset is freely available at https://stanfordmlgroup.github.io/competitions/chexpert .

Paper Structure

This paper contains 34 sections, 1 equation, 4 figures, 3 tables.

Figures (4)

  • Figure 1: The CheXpert task is to predict the probability of different observations from multi-view chest radiographs.
  • Figure 2: Output of the labeler when run on a report sampled from our dataset. In this case, the labeler correctly extracts all of the mentions in the report (underline) and classifies the uncertainties (bolded) and negations (italicized).
  • Figure 3: We compare the performance of 3 radiologists to the model against the test set ground truth in both the ROC and the PR space. We examine whether the radiologist operating points lie below the curves to determine if the model is superior to the radiologists. We also compute the lower (LabelL) and upper bounds (LabelU) of the performance of the labels extracted automatically from the radiology report using our labeling system against the test set ground truth.
  • Figure 4: The final model localizes findings in radiographs using Gradient-weighted Class Activation Mappings. The interpretation of the radiographs in the subcaptions is provided by a board-certified radiologist.