Table of Contents
Fetching ...

Combining Image- and Geometric-based Deep Learning for Shape Regression: A Comparison to Pixel-level Methods for Segmentation in Chest X-Ray

Ron Keuth, Mattias Heinrich

TL;DR

This work tackles chest X-ray segmentation by integrating geometric shape reasoning into deep learning through a hybrid CNN+graph neural network pipeline. A lightweight CNN backbone extracts image features, a geometric neural network processes a 2D landmark point cloud derived from an initial shape, and a shared MLP yields final landmark positions or shape configurations. Compared to pixel-level baselines on the JSRT dataset, the shape-based approach with a Point Transformer is competitive when using the same backbone and demonstrates enhanced robustness to image distortions, with notable gains emerging around 30% corruption. While nnU-Net remains the top performer overall, the study highlights the practical benefits of shape-based segmentation, including reduced anatomical implausibility and potential for human-in-the-loop refinements, warranting further enhancements with stronger backbones and cascade strategies.

Abstract

When solving a segmentation task, shaped-base methods can be beneficial compared to pixelwise classification due to geometric understanding of the target object as shape, preventing the generation of anatomical implausible predictions in particular for corrupted data. In this work, we propose a novel hybrid method that combines a lightweight CNN backbone with a geometric neural network (Point Transformer) for shape regression. Using the same CNN encoder, the Point Transformer reaches segmentation quality on per with current state-of-the-art convolutional decoders ($4\pm1.9$ vs $3.9\pm2.9$ error in mm and $85\pm13$ vs $88\pm10$ Dice), but crucially, is more stable w.r.t image distortion, starting to outperform them at a corruption level of 30%. Furthermore, we include the nnU-Net as an upper baseline, which has $3.7\times$ more trainable parameters than our proposed method.

Combining Image- and Geometric-based Deep Learning for Shape Regression: A Comparison to Pixel-level Methods for Segmentation in Chest X-Ray

TL;DR

This work tackles chest X-ray segmentation by integrating geometric shape reasoning into deep learning through a hybrid CNN+graph neural network pipeline. A lightweight CNN backbone extracts image features, a geometric neural network processes a 2D landmark point cloud derived from an initial shape, and a shared MLP yields final landmark positions or shape configurations. Compared to pixel-level baselines on the JSRT dataset, the shape-based approach with a Point Transformer is competitive when using the same backbone and demonstrates enhanced robustness to image distortions, with notable gains emerging around 30% corruption. While nnU-Net remains the top performer overall, the study highlights the practical benefits of shape-based segmentation, including reduced anatomical implausibility and potential for human-in-the-loop refinements, warranting further enhancements with stronger backbones and cascade strategies.

Abstract

When solving a segmentation task, shaped-base methods can be beneficial compared to pixelwise classification due to geometric understanding of the target object as shape, preventing the generation of anatomical implausible predictions in particular for corrupted data. In this work, we propose a novel hybrid method that combines a lightweight CNN backbone with a geometric neural network (Point Transformer) for shape regression. Using the same CNN encoder, the Point Transformer reaches segmentation quality on per with current state-of-the-art convolutional decoders ( vs error in mm and vs Dice), but crucially, is more stable w.r.t image distortion, starting to outperform them at a corruption level of 30%. Furthermore, we include the nnU-Net as an upper baseline, which has more trainable parameters than our proposed method.
Paper Structure (7 sections, 1 equation, 2 figures, 1 table)

This paper contains 7 sections, 1 equation, 2 figures, 1 table.

Figures (2)

  • Figure 1.1: A schematic overview of our pipeline for shape regression: A pretrained CNN backbone extracts image features, which are sampled in a point cloud for the geometric neural network (GNN) using a random initial shape of the training data. A shared MLP head finally predicts the shape directly or via relative displacement for each landmark.
  • Figure 1.2: Top left, a test image showing the initial, ground truth and predicted shape. Average surface distance (ASD) for the test split (top right) and results of our ablation study (bottom).