Table of Contents
Fetching ...

High-Fidelity Medical Shape Generation via Skeletal Latent Diffusion

Guoqing Zhang, Jingyun Yang, Siqi Chen, Anping Zhang, Yang Li

TL;DR

This work proposes a skeletal latent diffusion framework that explicitly incorporates structural priors for efficient and high-fidelity medical shape generation and introduces a shape auto-encoder in which the encoder captures global geometric information through a differentiable skeletonization module and aggregates local surface features into shape latents while the decoder predicts the corresponding implicit fields over sparsely sampled coordinates.

Abstract

Anatomy shape modeling is a fundamental problem in medical data analysis. However, the geometric complexity and topological variability of anatomical structures pose significant challenges to accurate anatomical shape generation. In this work, we propose a skeletal latent diffusion framework that explicitly incorporates structural priors for efficient and high-fidelity medical shape generation. We introduce a shape auto-encoder in which the encoder captures global geometric information through a differentiable skeletonization module and aggregates local surface features into shape latents, while the decoder predicts the corresponding implicit fields over sparsely sampled coordinates. New shapes are generated via a latent-space diffusion model, followed by neural implicit decoding and mesh extraction. To address the limited availability of medical shape data, we construct a large-scale dataset, \textit{MedSDF}, comprising surface point clouds and corresponding signed distance fields across multiple anatomical categories. Extensive experiments on MedSDF and vessel datasets demonstrate that the proposed method achieves superior reconstruction and generation quality while maintaining a higher computational efficiency compared with existing approaches. Code is available at: https://github.com/wlsdzyzl/meshage.

High-Fidelity Medical Shape Generation via Skeletal Latent Diffusion

TL;DR

This work proposes a skeletal latent diffusion framework that explicitly incorporates structural priors for efficient and high-fidelity medical shape generation and introduces a shape auto-encoder in which the encoder captures global geometric information through a differentiable skeletonization module and aggregates local surface features into shape latents while the decoder predicts the corresponding implicit fields over sparsely sampled coordinates.

Abstract

Anatomy shape modeling is a fundamental problem in medical data analysis. However, the geometric complexity and topological variability of anatomical structures pose significant challenges to accurate anatomical shape generation. In this work, we propose a skeletal latent diffusion framework that explicitly incorporates structural priors for efficient and high-fidelity medical shape generation. We introduce a shape auto-encoder in which the encoder captures global geometric information through a differentiable skeletonization module and aggregates local surface features into shape latents, while the decoder predicts the corresponding implicit fields over sparsely sampled coordinates. New shapes are generated via a latent-space diffusion model, followed by neural implicit decoding and mesh extraction. To address the limited availability of medical shape data, we construct a large-scale dataset, \textit{MedSDF}, comprising surface point clouds and corresponding signed distance fields across multiple anatomical categories. Extensive experiments on MedSDF and vessel datasets demonstrate that the proposed method achieves superior reconstruction and generation quality while maintaining a higher computational efficiency compared with existing approaches. Code is available at: https://github.com/wlsdzyzl/meshage.
Paper Structure (21 sections, 10 equations, 5 figures, 4 tables)

This paper contains 21 sections, 10 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: A semantic overview of the proposed architecture. We first train a VAE where the encoder maps the input surface and online-computed skeleton to a latent point set, which serves as a shape signature for the decoder to predict SDF values at queried spatial coordinates. A diffusion model is then trained in the latent feature space. For shape synthesis, new latents are sampled via the reverse diffusion process and decoded into neural implicit fields, which are subsequently converted into 3D shapes.
  • Figure 2: Left: differentiable skeletonization. Right: skeleton-guided coordinate sampling (right); only $10\%$ voxels (green grids) near the skeletal points (blue dots) are sampled for SDF calculation during inference.
  • Figure 3: Qualitative comparison of shape reconstruction on MedSDF dataset. From top to bottom, the rows correspond to samples of the brain, colon, left coronary artery (LCA), duodenum, liver, and stomach. GeM3D (S) and Ours (S) denotes the skeletal points computed by GeM3D and our method, respectively.
  • Figure 4: Qualitative comparison of shape generation on MedSDF dataset. From top to bottom, the rows correspond to samples of the brain, colon, LCA, duodenum, liver, and stomach. For each category and method, we generate 500 samples and display the two nearest neighbors of the reference point cloud.
  • Figure 5: Qualitative comparison of tubular shape reconstruction and generation. The top two rows correspond to reconstruction results of input point clouds from CoW and ImageCAS datasets, while the bottom two rows display the neighbors of input point clouds among the generated shapes.