MASSM: An End-to-End Deep Learning Framework for Multi-Anatomy Statistical Shape Modeling Directly From Images
Janmesh Ukey, Tushar Kataria, Shireen Y. Elhabian
TL;DR
MASSM introduces an end-to-end multitask framework that simultaneously localizes multiple anatomies, estimates population-level and image-space shape representations, and delineates anatomy directly in 3D images. It comprises an Anatomy Detection Block, a Local Correspondences predictor, and a World Correspondences predictor to generate local $\mathbf{L}_n^{k}$ and world $\mathbf{W}_n^{(k)}$ correspondences, forming robust Point Distribution Model (PDM) representations without manual pre-processing. The method integrates CenterNet-based detection, ROI-feature–driven local particle prediction, and differentiable Procrustes alignment for world correspondence, optimized by a combined loss $\mathcal{L} = \lambda_h \mathcal{L}_h + \lambda_r \mathcal{L}_r + \lambda_o \mathcal{L}_o + \lambda_l \mathcal{L}_l + \lambda_w \mathcal{L}_w$. Empirical results on the TotalSegmentator dataset show MASSM outperforms segmentation-based baselines in surface-to-surface accuracy and, importantly, provides richer shape priors and multi-anatomy efficiency, highlighting its potential for automated, scalable anatomical analysis. The work demonstrates that local correspondences yield superior shape information compared to pixel-wise segmentation and that a single multi-anatomy model can achieve substantial training-time gains without sacrificing performance.
Abstract
Statistical Shape Modeling (SSM) effectively analyzes anatomical variations within populations but is limited by the need for manual localization and segmentation, which relies on scarce medical expertise. Recent advances in deep learning have provided a promising approach that automatically generates statistical representations (as point distribution models or PDMs) from unsegmented images. Once trained, these deep learning-based models eliminate the need for manual segmentation for new subjects. Most deep learning methods still require manual pre-alignment of image volumes and bounding box specification around the target anatomy, leading to a partially manual inference process. Recent approaches facilitate anatomy localization but only estimate population-level statistical representations and cannot directly delineate anatomy in images. Additionally, they are limited to modeling a single anatomy. We introduce MASSM, a novel end-to-end deep learning framework that simultaneously localizes multiple anatomies, estimates population-level statistical representations, and delineates shape representations directly in image space. Our results show that MASSM, which delineates anatomy in image space and handles multiple anatomies through a multitask network, provides superior shape information compared to segmentation networks for medical imaging tasks. Estimating Statistical Shape Models (SSM) is a stronger task than segmentation, as it encodes a more robust statistical prior for the objects to be detected and delineated. MASSM allows for more accurate and comprehensive shape representations, surpassing the capabilities of traditional pixel-wise segmentation.
