Table of Contents
Fetching ...

MASSM: An End-to-End Deep Learning Framework for Multi-Anatomy Statistical Shape Modeling Directly From Images

Janmesh Ukey, Tushar Kataria, Shireen Y. Elhabian

TL;DR

MASSM introduces an end-to-end multitask framework that simultaneously localizes multiple anatomies, estimates population-level and image-space shape representations, and delineates anatomy directly in 3D images. It comprises an Anatomy Detection Block, a Local Correspondences predictor, and a World Correspondences predictor to generate local $\mathbf{L}_n^{k}$ and world $\mathbf{W}_n^{(k)}$ correspondences, forming robust Point Distribution Model (PDM) representations without manual pre-processing. The method integrates CenterNet-based detection, ROI-feature–driven local particle prediction, and differentiable Procrustes alignment for world correspondence, optimized by a combined loss $\mathcal{L} = \lambda_h \mathcal{L}_h + \lambda_r \mathcal{L}_r + \lambda_o \mathcal{L}_o + \lambda_l \mathcal{L}_l + \lambda_w \mathcal{L}_w$. Empirical results on the TotalSegmentator dataset show MASSM outperforms segmentation-based baselines in surface-to-surface accuracy and, importantly, provides richer shape priors and multi-anatomy efficiency, highlighting its potential for automated, scalable anatomical analysis. The work demonstrates that local correspondences yield superior shape information compared to pixel-wise segmentation and that a single multi-anatomy model can achieve substantial training-time gains without sacrificing performance.

Abstract

Statistical Shape Modeling (SSM) effectively analyzes anatomical variations within populations but is limited by the need for manual localization and segmentation, which relies on scarce medical expertise. Recent advances in deep learning have provided a promising approach that automatically generates statistical representations (as point distribution models or PDMs) from unsegmented images. Once trained, these deep learning-based models eliminate the need for manual segmentation for new subjects. Most deep learning methods still require manual pre-alignment of image volumes and bounding box specification around the target anatomy, leading to a partially manual inference process. Recent approaches facilitate anatomy localization but only estimate population-level statistical representations and cannot directly delineate anatomy in images. Additionally, they are limited to modeling a single anatomy. We introduce MASSM, a novel end-to-end deep learning framework that simultaneously localizes multiple anatomies, estimates population-level statistical representations, and delineates shape representations directly in image space. Our results show that MASSM, which delineates anatomy in image space and handles multiple anatomies through a multitask network, provides superior shape information compared to segmentation networks for medical imaging tasks. Estimating Statistical Shape Models (SSM) is a stronger task than segmentation, as it encodes a more robust statistical prior for the objects to be detected and delineated. MASSM allows for more accurate and comprehensive shape representations, surpassing the capabilities of traditional pixel-wise segmentation.

MASSM: An End-to-End Deep Learning Framework for Multi-Anatomy Statistical Shape Modeling Directly From Images

TL;DR

MASSM introduces an end-to-end multitask framework that simultaneously localizes multiple anatomies, estimates population-level and image-space shape representations, and delineates anatomy directly in 3D images. It comprises an Anatomy Detection Block, a Local Correspondences predictor, and a World Correspondences predictor to generate local and world correspondences, forming robust Point Distribution Model (PDM) representations without manual pre-processing. The method integrates CenterNet-based detection, ROI-feature–driven local particle prediction, and differentiable Procrustes alignment for world correspondence, optimized by a combined loss . Empirical results on the TotalSegmentator dataset show MASSM outperforms segmentation-based baselines in surface-to-surface accuracy and, importantly, provides richer shape priors and multi-anatomy efficiency, highlighting its potential for automated, scalable anatomical analysis. The work demonstrates that local correspondences yield superior shape information compared to pixel-wise segmentation and that a single multi-anatomy model can achieve substantial training-time gains without sacrificing performance.

Abstract

Statistical Shape Modeling (SSM) effectively analyzes anatomical variations within populations but is limited by the need for manual localization and segmentation, which relies on scarce medical expertise. Recent advances in deep learning have provided a promising approach that automatically generates statistical representations (as point distribution models or PDMs) from unsegmented images. Once trained, these deep learning-based models eliminate the need for manual segmentation for new subjects. Most deep learning methods still require manual pre-alignment of image volumes and bounding box specification around the target anatomy, leading to a partially manual inference process. Recent approaches facilitate anatomy localization but only estimate population-level statistical representations and cannot directly delineate anatomy in images. Additionally, they are limited to modeling a single anatomy. We introduce MASSM, a novel end-to-end deep learning framework that simultaneously localizes multiple anatomies, estimates population-level statistical representations, and delineates shape representations directly in image space. Our results show that MASSM, which delineates anatomy in image space and handles multiple anatomies through a multitask network, provides superior shape information compared to segmentation networks for medical imaging tasks. Estimating Statistical Shape Models (SSM) is a stronger task than segmentation, as it encodes a more robust statistical prior for the objects to be detected and delineated. MASSM allows for more accurate and comprehensive shape representations, surpassing the capabilities of traditional pixel-wise segmentation.
Paper Structure (11 sections, 7 equations, 6 figures)

This paper contains 11 sections, 7 equations, 6 figures.

Figures (6)

  • Figure 1: Multi-Anatomy Statistical Shape Model (MASSM). Block diagram of the proposed end-to-end method to obtained statistical shape representation of multiple anatomies simultaneously. The proposed model has three networks; (a) Anatomy Detection, which extracts different anatomies of interest; (b) Local Correspondences, which predicts local particle correspondences and; (c) World Correspondences, which predicts world correspondences.
  • Figure 2: Local Correspondence Prediction Results. (A) RMSE on local particles reported for all seven anatomies under test when compared with DeepSSM-Lbhalodia2018deepssm.(B) Surface-to-surface distance (mm) for local particles reconstructions reported for seven anatomies.
  • Figure 3: Surface-to-surface distance errors for shape reconstruction of local correspondences, depicted as a heatmap on ground truth reconstructed meshes, showcasing best and worst-case scenarios across 7 anatomies. Heart ventricle left (HVL), heart ventricle right (HVR), heart atrium left (HAL), heart atrium right (HAR), lung upper lobe left (LULL), lung upper lobe right (LULR) and spleen (S).
  • Figure 4: World Particle Prediction Results. (A) RMSE on World Particles reported for seven anatomies. (B) Surface-to-surface distance (mm) for world particles reconstructions reported for seven anatomies.
  • Figure 5: Surface-to-surface distance errors for shape reconstruction of world correspondences, depicted as a heatmap on ground truth reconstructed meshes, showcasing best and worst-case scenarios across 7 anatomies. Heart ventricle left (HVL), heart ventricle right (HVR), heart atrium left (HAL), heart atrium right (HAR), lung upper lobe left (LULL), lung upper lobe right (LULR) and spleen (S).
  • ...and 1 more figures