Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge
Gregory Holste, Yiliang Zhou, Song Wang, Ajay Jaiswal, Mingquan Lin, Sherry Zhuge, Yuzhe Yang, Dongkyun Kim, Trong-Hieu Nguyen-Mau, Minh-Triet Tran, Jaehyup Jeong, Wongi Park, Jongbin Ryu, Feng Hong, Arsh Verma, Yosuke Yamagishi, Changhyun Kim, Hyeryeong Seo, Myungjoo Kang, Leo Anthony Celi, Zhiyong Lu, Ronald M. Summers, George Shih, Zhangyang Wang, Yifan Peng
TL;DR
The paper tackles the challenge of long-tailed, multi-label disease classification in chest X-rays by releasing the CXR-LT benchmark and a gold-standard test subset, enabling large-scale study of label imbalance and label co-occurrence. It analyzes top-performing solutions from the CXR-LT challenge, identifying common strategies such as high-resolution input, domain-specific pretraining, ensembling, loss re-weighting, and multimodal label representations. The study finds that while the top methods achieve strong overall performance, tail-class accuracy remains a bottleneck and is highly sensitive to labeling noise and distribution shifts between automatic and human annotations. It then outlines practical recommendations and argues for a path toward zero-shot disease classification using vision-language foundation models, which could robustly generalize to unseen findings by leveraging textual label representations and cross-modal learning. Collectively, the work contributes a valuable dataset, an in-depth analysis of competing approaches, and a forward-looking framework for scalable, zero-shot medical image understanding in long-tailed, multi-label contexts.
Abstract
Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" $\unicode{x2013}$ there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of long-tailed learning in medical image recognition, few have studied the interaction of label imbalance and label co-occurrence posed by long-tailed, multi-label disease classification. To engage with the research community on this emerging topic, we conducted an open challenge, CXR-LT, on long-tailed, multi-label thorax disease classification from chest X-rays (CXRs). We publicly release a large-scale benchmark dataset of over 350,000 CXRs, each labeled with at least one of 26 clinical findings following a long-tailed distribution. We synthesize common themes of top-performing solutions, providing practical recommendations for long-tailed, multi-label medical image classification. Finally, we use these insights to propose a path forward involving vision-language foundation models for few- and zero-shot disease classification.
