Table of Contents
Fetching ...

Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge

Gregory Holste, Yiliang Zhou, Song Wang, Ajay Jaiswal, Mingquan Lin, Sherry Zhuge, Yuzhe Yang, Dongkyun Kim, Trong-Hieu Nguyen-Mau, Minh-Triet Tran, Jaehyup Jeong, Wongi Park, Jongbin Ryu, Feng Hong, Arsh Verma, Yosuke Yamagishi, Changhyun Kim, Hyeryeong Seo, Myungjoo Kang, Leo Anthony Celi, Zhiyong Lu, Ronald M. Summers, George Shih, Zhangyang Wang, Yifan Peng

TL;DR

The paper tackles the challenge of long-tailed, multi-label disease classification in chest X-rays by releasing the CXR-LT benchmark and a gold-standard test subset, enabling large-scale study of label imbalance and label co-occurrence. It analyzes top-performing solutions from the CXR-LT challenge, identifying common strategies such as high-resolution input, domain-specific pretraining, ensembling, loss re-weighting, and multimodal label representations. The study finds that while the top methods achieve strong overall performance, tail-class accuracy remains a bottleneck and is highly sensitive to labeling noise and distribution shifts between automatic and human annotations. It then outlines practical recommendations and argues for a path toward zero-shot disease classification using vision-language foundation models, which could robustly generalize to unseen findings by leveraging textual label representations and cross-modal learning. Collectively, the work contributes a valuable dataset, an in-depth analysis of competing approaches, and a forward-looking framework for scalable, zero-shot medical image understanding in long-tailed, multi-label contexts.

Abstract

Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" $\unicode{x2013}$ there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.

Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge

TL;DR

The paper tackles the challenge of long-tailed, multi-label disease classification in chest X-rays by releasing the CXR-LT benchmark and a gold-standard test subset, enabling large-scale study of label imbalance and label co-occurrence. It analyzes top-performing solutions from the CXR-LT challenge, identifying common strategies such as high-resolution input, domain-specific pretraining, ensembling, loss re-weighting, and multimodal label representations. The study finds that while the top methods achieve strong overall performance, tail-class accuracy remains a bottleneck and is highly sensitive to labeling noise and distribution shifts between automatic and human annotations. It then outlines practical recommendations and argues for a path toward zero-shot disease classification using vision-language foundation models, which could robustly generalize to unseen findings by leveraging textual label representations and cross-modal learning. Collectively, the work contributes a valuable dataset, an in-depth analysis of competing approaches, and a forward-looking framework for scalable, zero-shot medical image understanding in long-tailed, multi-label contexts.

Abstract

Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.
Paper Structure (30 sections, 8 figures, 8 tables)

This paper contains 30 sections, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Long-tailed distribution of the CXR-LT 2023 challenge dataset. The dataset was formed by extending the MIMIC-CXR johnson2019mimic benchmark to include 12 new clinical findings (red) by parsing radiology reports.
  • Figure 1: Long-tailed distribution of the CXR-LT gold standard test set.
  • Figure 2: Flowchart describing CXR-LT gold standard dataset annotation.
  • Figure 2: Heatmap displaying the difference in class co-occurrence tendencies between automatically text-mined and manually human-annotated labels. Each entry depicts the difference in conditional probability of the x-axis finding given the y-axis finding between the gold standard human labels and automatically text-mined labels.
  • Figure 3: Flowchart describing CXR-LT challenge participation. Over 200 teams applied to participate in the challenge on CodaLab, and 59 teams met registration requirements. Of the 17 teams that participated in the Test Phase, 11 submitted their written solutions for presentation at the ICCV CVAMD 2023 workshop. The top 9 of these submissions were accepted to the workshop and are described in this paper.
  • ...and 3 more figures