Table of Contents
Fetching ...

Clinically-Inspired Hierarchical Multi-Label Classification of Chest X-rays with a Penalty-Based Loss Function

Mehrdad Asadi, Komi Sodoké, Ian J. Gerard, Marta Kersten-Oertel

TL;DR

This work tackles multi-label chest X-ray classification by introducing clinically-inspired hierarchical label groupings and a novel HBCE loss that enforces parent–child dependencies. The approach uses a single-model, single-run pipeline based on DenseNet121, extended with a hierarchical loss term L_{HBCE} = L_{BCE} + \lambda \sum_{p,c} P_{p,c}, where penalties can be fixed or data-driven to reflect empirical label dependencies. It achieves a weighted AUROC of 0.903 on CheXpert, and employs Monte Carlo uncertainty estimation and Grad-CAM visualizations to enhance interpretability, with a public code release. The results indicate data-driven penalties can improve performance on high-level categories and several pathologies, supporting the clinical utility of hierarchical, interpretable multi-label CXR classification in real-world settings.

Abstract

In this work, we present a novel approach to multi-label chest X-ray (CXR) image classification that enhances clinical interpretability while maintaining a streamlined, single-model, single-run training pipeline. Leveraging the CheXpert dataset and VisualCheXbert-derived labels, we incorporate hierarchical label groupings to capture clinically meaningful relationships between diagnoses. To achieve this, we designed a custom hierarchical binary cross-entropy (HBCE) loss function that enforces label dependencies using either fixed or data-driven penalty types. Our model achieved a mean area under the receiver operating characteristic curve (AUROC) of 0.903 on the test set. Additionally, we provide visual explanations and uncertainty estimations to further enhance model interpretability. All code, model configurations, and experiment details are made available.

Clinically-Inspired Hierarchical Multi-Label Classification of Chest X-rays with a Penalty-Based Loss Function

TL;DR

This work tackles multi-label chest X-ray classification by introducing clinically-inspired hierarchical label groupings and a novel HBCE loss that enforces parent–child dependencies. The approach uses a single-model, single-run pipeline based on DenseNet121, extended with a hierarchical loss term L_{HBCE} = L_{BCE} + \lambda \sum_{p,c} P_{p,c}, where penalties can be fixed or data-driven to reflect empirical label dependencies. It achieves a weighted AUROC of 0.903 on CheXpert, and employs Monte Carlo uncertainty estimation and Grad-CAM visualizations to enhance interpretability, with a public code release. The results indicate data-driven penalties can improve performance on high-level categories and several pathologies, supporting the clinical utility of hierarchical, interpretable multi-label CXR classification in real-world settings.

Abstract

In this work, we present a novel approach to multi-label chest X-ray (CXR) image classification that enhances clinical interpretability while maintaining a streamlined, single-model, single-run training pipeline. Leveraging the CheXpert dataset and VisualCheXbert-derived labels, we incorporate hierarchical label groupings to capture clinically meaningful relationships between diagnoses. To achieve this, we designed a custom hierarchical binary cross-entropy (HBCE) loss function that enforces label dependencies using either fixed or data-driven penalty types. Our model achieved a mean area under the receiver operating characteristic curve (AUROC) of 0.903 on the test set. Additionally, we provide visual explanations and uncertainty estimations to further enhance model interpretability. All code, model configurations, and experiment details are made available.

Paper Structure

This paper contains 23 sections, 4 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Clinically-Inspired Taxonomy
  • Figure 2: AUROC curves for the Hierarchical training strategy with a Data-Driven Penalty and a scale factor of 0.5 on the CheXpert dataset.
  • Figure 3: Activations on the sample test image. Top row: defined high-level and "Uncertain" labels. Middle and bottom rows: original labels in the CheXpert dataset. Activations are normalized and clipped with a 0.5 threshold, and the color map is segmented to enhance both localization and interpretation. In this example, Fluid Accumulation shows high activation over potential fluid-affected areas, consistent with findings in Pleural Effusion and Edema. In the bottom row, child pathologies such as Cardiomegaly, Lung Opacity, and Enlarged Cardiomediastinum show targeted focus in the heart region and other expected locations, validating the model’s spatial attention. Prediction confidence and uncertainty for ${N=10}$ Monte Carlo iterations can be found on top of each sub-image. Ground truth: Atelectasis, Cardiomegaly, Edema, Enlarged Cardiomegaly, Lung Opacity, Pleural Effusion