Combining Image- and Geometric-based Deep Learning for Shape Regression: A Comparison to Pixel-level Methods for Segmentation in Chest X-Ray
Ron Keuth, Mattias Heinrich
TL;DR
This work tackles chest X-ray segmentation by integrating geometric shape reasoning into deep learning through a hybrid CNN+graph neural network pipeline. A lightweight CNN backbone extracts image features, a geometric neural network processes a 2D landmark point cloud derived from an initial shape, and a shared MLP yields final landmark positions or shape configurations. Compared to pixel-level baselines on the JSRT dataset, the shape-based approach with a Point Transformer is competitive when using the same backbone and demonstrates enhanced robustness to image distortions, with notable gains emerging around 30% corruption. While nnU-Net remains the top performer overall, the study highlights the practical benefits of shape-based segmentation, including reduced anatomical implausibility and potential for human-in-the-loop refinements, warranting further enhancements with stronger backbones and cascade strategies.
Abstract
When solving a segmentation task, shaped-base methods can be beneficial compared to pixelwise classification due to geometric understanding of the target object as shape, preventing the generation of anatomical implausible predictions in particular for corrupted data. In this work, we propose a novel hybrid method that combines a lightweight CNN backbone with a geometric neural network (Point Transformer) for shape regression. Using the same CNN encoder, the Point Transformer reaches segmentation quality on per with current state-of-the-art convolutional decoders ($4\pm1.9$ vs $3.9\pm2.9$ error in mm and $85\pm13$ vs $88\pm10$ Dice), but crucially, is more stable w.r.t image distortion, starting to outperform them at a corruption level of 30%. Furthermore, we include the nnU-Net as an upper baseline, which has $3.7\times$ more trainable parameters than our proposed method.
