Table of Contents
Fetching ...

Weakly Supervised Object Detection in Chest X-Rays with Differentiable ROI Proposal Networks and Soft ROI Pooling

Philip Müller, Felix Meissen, Georgios Kaissis, Daniel Rueckert

TL;DR

This work tackles the challenge of localizing pathologies in chest X-rays with only image-level labels. It introduces Weakly Supervised ROI Proposal Networks (WSRPN), a differentiable, end-to-end system that learns bounding box proposals via ROI attention and Gaussian ROI pooling within a two-branch MIL framework (patch and ROI branches) and a consistency loss between branches. On ChestXray-8, WSRPN achieves state-of-the-art results across RoDeO, AP, and localization metrics, with extensive ablations demonstrating the necessity of components such as the loss terms, ROI tokens, and the Gaussian pooling scheme. The approach enables end-to-end optimization of box parameters under weak supervision, offering practical clinical value and potential extensions to multimodal or semi-supervised settings.

Abstract

Weakly supervised object detection (WSup-OD) increases the usefulness and interpretability of image classification algorithms without requiring additional supervision. The successes of multiple instance learning in this task for natural images, however, do not translate well to medical images due to the very different characteristics of their objects (i.e. pathologies). In this work, we propose Weakly Supervised ROI Proposal Networks (WSRPN), a new method for generating bounding box proposals on the fly using a specialized region of interest-attention (ROI-attention) module. WSRPN integrates well with classic backbone-head classification algorithms and is end-to-end trainable with only image-label supervision. We experimentally demonstrate that our new method outperforms existing methods in the challenging task of disease localization in chest X-ray images. Code: https://github.com/philip-mueller/wsrpn

Weakly Supervised Object Detection in Chest X-Rays with Differentiable ROI Proposal Networks and Soft ROI Pooling

TL;DR

This work tackles the challenge of localizing pathologies in chest X-rays with only image-level labels. It introduces Weakly Supervised ROI Proposal Networks (WSRPN), a differentiable, end-to-end system that learns bounding box proposals via ROI attention and Gaussian ROI pooling within a two-branch MIL framework (patch and ROI branches) and a consistency loss between branches. On ChestXray-8, WSRPN achieves state-of-the-art results across RoDeO, AP, and localization metrics, with extensive ablations demonstrating the necessity of components such as the loss terms, ROI tokens, and the Gaussian pooling scheme. The approach enables end-to-end optimization of box parameters under weak supervision, offering practical clinical value and potential extensions to multimodal or semi-supervised settings.

Abstract

Weakly supervised object detection (WSup-OD) increases the usefulness and interpretability of image classification algorithms without requiring additional supervision. The successes of multiple instance learning in this task for natural images, however, do not translate well to medical images due to the very different characteristics of their objects (i.e. pathologies). In this work, we propose Weakly Supervised ROI Proposal Networks (WSRPN), a new method for generating bounding box proposals on the fly using a specialized region of interest-attention (ROI-attention) module. WSRPN integrates well with classic backbone-head classification algorithms and is end-to-end trainable with only image-label supervision. We experimentally demonstrate that our new method outperforms existing methods in the challenging task of disease localization in chest X-ray images. Code: https://github.com/philip-mueller/wsrpn
Paper Structure (39 sections, 12 equations, 6 figures, 8 tables)

This paper contains 39 sections, 12 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Schematic illustration of MIL-based, CAM-based, and our novel WSRPN approach.
  • Figure 2: Overview of our model architecture. We show the patch branch (blue) and the ROI branch (purple), each with the encoding steps, MIL classification and aggregation, and the loss functions. Components typically used in a MIL model are colored in blue. Our key contributions are outlined with bold lines. "sw" stands for shared weights. Yellow denotes parts of the bounding box prediction.
  • Figure 3: ROI attention component from our ROI branch. Using cross-attention, ROI tokens $\{\bm{q}_k\}$ gather relevant information from the patch features $\{\bm{h}^{\mathcal{P}}_{m, n}\}$ to compute the ROI features $\{\hat{\bm{h}}^\mathcal{R}_{k}\}$.
  • Figure 4: Comparison of the results per pathology between our method WSRPN and the best baseline on the bootstrapped ($N=250$) test set. On five pathologies (atelectasis, cardiomegaly, effusion, mass, and nodule), our WSRPN method performs significantly better, on pneumothorax, it is competitive with the baselines, while on two pathologies (infiltration and pneumonia), it performs worse.
  • Figure 5: Confusion matrix for our proposed WSRPN. The matrix was generated from the 1-to-1 correspondences between predicted and ground-truth boxes after the matching step in RoDeO rodeo.
  • ...and 1 more figures