Table of Contents
Fetching ...

More Than Positive and Negative: Communicating Fine Granularity in Medical Diagnosis

Xiangyu Peng, Kai Wang, Jianfei Yang, Yingying Zhu, Yang You

TL;DR

This work addresses the mismatch between binary chest X-ray diagnosis and real-world clinical variability within positive findings. It introduces a fine-grained benchmark that splits positive findings into atypical and typical positives based on severity and time-based change, quantified with $AUC^{\\text{FG}}$. To learn this granularity from coarse labels, the authors propose PU-RM, a risk-modulated training framework using the Partially Hubberised Cross Entropy loss with a tangent point $\tau$. On the MIMIC-CXR-JPG dataset, PU-RM yields higher $AUC^{\\text{FG}}$ on consolidation and edema than baseline uncertainty methods, supported by CAM visualizations showing more appropriate activation patterns. Together, these results offer a practical baseline and a step toward AI diagnoses that communicate clinically meaningful fine-grained knowledge.

Abstract

With the advance of deep learning, much progress has been made in building powerful artificial intelligence (AI) systems for automatic Chest X-ray (CXR) analysis. Most existing AI models are trained to be a binary classifier with the aim of distinguishing positive and negative cases. However, a large gap exists between the simple binary setting and complicated real-world medical scenarios. In this work, we reinvestigate the problem of automatic radiology diagnosis. We first observe that there is considerable diversity among cases within the positive class, which means simply classifying them as positive loses many important details. This motivates us to build AI models that can communicate fine-grained knowledge from medical images like human experts. To this end, we first propose a new benchmark on fine granularity learning from medical images. Specifically, we devise a division rule based on medical knowledge to divide positive cases into two subcategories, namely atypical positive and typical positive. Then, we propose a new metric termed AUC$^\text{FG}$ on the two subcategories for evaluation of the ability to separate them apart. With the proposed benchmark, we encourage the community to develop AI diagnosis systems that could better learn fine granularity from medical images. Last, we propose a simple risk modulation approach to this problem by only using coarse labels in training. Empirical results show that despite its simplicity, the proposed method achieves superior performance and thus serves as a strong baseline.

More Than Positive and Negative: Communicating Fine Granularity in Medical Diagnosis

TL;DR

This work addresses the mismatch between binary chest X-ray diagnosis and real-world clinical variability within positive findings. It introduces a fine-grained benchmark that splits positive findings into atypical and typical positives based on severity and time-based change, quantified with . To learn this granularity from coarse labels, the authors propose PU-RM, a risk-modulated training framework using the Partially Hubberised Cross Entropy loss with a tangent point . On the MIMIC-CXR-JPG dataset, PU-RM yields higher on consolidation and edema than baseline uncertainty methods, supported by CAM visualizations showing more appropriate activation patterns. Together, these results offer a practical baseline and a step toward AI diagnoses that communicate clinically meaningful fine-grained knowledge.

Abstract

With the advance of deep learning, much progress has been made in building powerful artificial intelligence (AI) systems for automatic Chest X-ray (CXR) analysis. Most existing AI models are trained to be a binary classifier with the aim of distinguishing positive and negative cases. However, a large gap exists between the simple binary setting and complicated real-world medical scenarios. In this work, we reinvestigate the problem of automatic radiology diagnosis. We first observe that there is considerable diversity among cases within the positive class, which means simply classifying them as positive loses many important details. This motivates us to build AI models that can communicate fine-grained knowledge from medical images like human experts. To this end, we first propose a new benchmark on fine granularity learning from medical images. Specifically, we devise a division rule based on medical knowledge to divide positive cases into two subcategories, namely atypical positive and typical positive. Then, we propose a new metric termed AUC on the two subcategories for evaluation of the ability to separate them apart. With the proposed benchmark, we encourage the community to develop AI diagnosis systems that could better learn fine granularity from medical images. Last, we propose a simple risk modulation approach to this problem by only using coarse labels in training. Empirical results show that despite its simplicity, the proposed method achieves superior performance and thus serves as a strong baseline.
Paper Structure (23 sections, 3 equations, 5 figures, 3 tables)

This paper contains 23 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison of the Cross-Entropy (CE) loss and our method in medical image diagnosis. The CE loss treats positive samples as hard samples, which leads to model overfitting. Moreover, it ignores the discrepancy between atypical positive and typical positive. We propose to modulate the risk in the CE loss for hard samples, which prevents the model from overfitting and improves its ability to learn fine granularity within the positive class.
  • Figure 2: Example CXR images with correspondent reports and text-mined labels. Although subfigure (a) and subfigure (b) are both labeled as positive, there is a large discrepancy between them. Subfigure (a) is a typical positive case with severe symptoms of consolidation, while subfigure (b) is an atypical case in which the condition of the patient has substantially improved with little symptom left. Though labeled as positive, the case in (b) is closer to the fringe of negative in human expert evaluation.
  • Figure 3: Loss curves for the CE loss and PCE losses with different parameters $\tau$.
  • Figure 4: Ablation studies of different $\tau$ on the validation set for consolidation and edema. The line and the band represent the mean and std of three runs respectively.
  • Figure 5: Visualization of model predictions using CAM. Subfigure (a) shows the original medical image, which is considered atypical positive according to the proposed division rule in Tab. \ref{['tab:atypical']}. The baseline U-Ones model trained with the CE loss falsely activates in the right lower lobe area as shown in subfigure (b). In contrast, our method shown in subfigure (c) does not focus on any specific area. It also predicts a much lower positive probability of 0.01 than the baseline model of 0.44.