Energy-Based Prior Latent Space Diffusion model for Reconstruction of Lumbar Vertebrae from Thick Slice MRI
Yanke Wang, Yolanne Y. R. Lee, Aurelio Dolfini, Markus Reischl, Ender Konukoglu, Kyriakos Flouris
TL;DR
The study addresses reconstructing lumbar vertebrae from thick-slice MRI by introducing lsdebm, a latent-space diffusion energy-based prior that operates on a latent variable $\mathbf{z}$ and uses a neural-energy $\mathrm{E}_\alpha(\mathbf{z},t)$ to guide diffusion in a compact space. Integrated into a ReconNet-based post-processing pipeline, lsdebm yields higher-fidelity 3D vertebral reconstructions than VAE and latent EBMs, as evidenced by superior Dice, VS, and information-theoretic metrics on paired MRI-CT data, while maintaining practical training and inference times. The method demonstrates stable latent-space diffusion with faster convergence and reduced computational burden compared with full-image diffusion, highlighting a feasible path toward safer MRI-based vertebrae modeling with detailed anatomical fidelity. Overall, the work provides a practical, data-driven prior for 3D medical image reconstruction that improves clinical realism and potential treatment planning using MRI alone.
Abstract
Lumbar spine problems are ubiquitous, motivating research into targeted imaging for treatment planning and guided interventions. While high resolution and high contrast CT has been the modality of choice, MRI can capture both bone and soft tissue without the ionizing radiation of CT albeit longer acquisition time. The critical trade-off between contrast quality and acquisition time has motivated 'thick slice MRI', which prioritises faster imaging with high in-plane resolution but variable contrast and low through-plane resolution. We investigate a recently developed post-acquisition pipeline which segments vertebrae from thick-slice acquisitions and uses a variational autoencoder to enhance quality after an initial 3D reconstruction. We instead propose a latent space diffusion energy-based prior to leverage diffusion models, which exhibit high-quality image generation. Crucially, we mitigate their high computational cost and low sample efficiency by learning an energy-based latent representation to perform the diffusion processes. Our resulting method outperforms existing approaches across metrics including Dice and VS scores, and more faithfully captures 3D features.
