DenseSeg: Joint Learning for Semantic Segmentation and Landmark Detection Using Dense Image-to-Shape Representation
Ron Keuth, Lasse Hansen, Maren Balks, Ronja Jäger, Anne-Nele Schröder, Ludger Tüshaus, Mattias Heinrich
TL;DR
DenseSeg addresses the challenge of simultaneously performing semantic segmentation and landmark detection in medical images by introducing a dense image-to-shape representation based on $uv$-maps. It uses a two-head UNet to jointly predict segmentation and $uv$-maps, optimizing a multi-term loss that includes $L_ ext{BCE}$, $L_ extphi$, $L_ ext{LM}$, and $L_ ext{TV}$, enabling explicit anatomical correspondences without requiring landmark-specific training. The method achieves competitive landmark accuracy on the jsrt thorax dataset and superior performance on the Graz pediatric wrist dataset, while also allowing new landmarks to be added without retraining, highlighting practical flexibility. These results illustrate the value of a dense geometric representation for challenging landmark detection tasks and demonstrate potential for extending to additional anatomical structures and clinical applications.
Abstract
Purpose: Semantic segmentation and landmark detection are fundamental tasks of medical image processing, facilitating further analysis of anatomical objects. Although deep learning-based pixel-wise classification has set a new-state-of-the-art for segmentation, it falls short in landmark detection, a strength of shape-based approaches. Methods: In this work, we propose a dense image-to-shape representation that enables the joint learning of landmarks and semantic segmentation by employing a fully convolutional architecture. Our method intuitively allows the extraction of arbitrary landmarks due to its representation of anatomical correspondences. We benchmark our method against the state-of-the-art for semantic segmentation (nnUNet), a shape-based approach employing geometric deep learning and a convolutional neural network-based method for landmark detection. Results: We evaluate our method on two medical dataset: one common benchmark featuring the lungs, heart, and clavicle from thorax X-rays, and another with 17 different bones in the paediatric wrist. While our method is on pair with the landmark detection baseline in the thorax setting (error in mm of $2.6\pm0.9$ vs $2.7\pm0.9$), it substantially surpassed it in the more complex wrist setting ($1.1\pm0.6$ vs $1.9\pm0.5$). Conclusion: We demonstrate that dense geometric shape representation is beneficial for challenging landmark detection tasks and outperforms previous state-of-the-art using heatmap regression. While it does not require explicit training on the landmarks themselves, allowing for the addition of new landmarks without necessitating retraining.}
