Table of Contents
Fetching ...

Energy-Based Prior Latent Space Diffusion model for Reconstruction of Lumbar Vertebrae from Thick Slice MRI

Yanke Wang, Yolanne Y. R. Lee, Aurelio Dolfini, Markus Reischl, Ender Konukoglu, Kyriakos Flouris

TL;DR

The study addresses reconstructing lumbar vertebrae from thick-slice MRI by introducing lsdebm, a latent-space diffusion energy-based prior that operates on a latent variable $\mathbf{z}$ and uses a neural-energy $\mathrm{E}_\alpha(\mathbf{z},t)$ to guide diffusion in a compact space. Integrated into a ReconNet-based post-processing pipeline, lsdebm yields higher-fidelity 3D vertebral reconstructions than VAE and latent EBMs, as evidenced by superior Dice, VS, and information-theoretic metrics on paired MRI-CT data, while maintaining practical training and inference times. The method demonstrates stable latent-space diffusion with faster convergence and reduced computational burden compared with full-image diffusion, highlighting a feasible path toward safer MRI-based vertebrae modeling with detailed anatomical fidelity. Overall, the work provides a practical, data-driven prior for 3D medical image reconstruction that improves clinical realism and potential treatment planning using MRI alone.

Abstract

Lumbar spine problems are ubiquitous, motivating research into targeted imaging for treatment planning and guided interventions. While high resolution and high contrast CT has been the modality of choice, MRI can capture both bone and soft tissue without the ionizing radiation of CT albeit longer acquisition time. The critical trade-off between contrast quality and acquisition time has motivated 'thick slice MRI', which prioritises faster imaging with high in-plane resolution but variable contrast and low through-plane resolution. We investigate a recently developed post-acquisition pipeline which segments vertebrae from thick-slice acquisitions and uses a variational autoencoder to enhance quality after an initial 3D reconstruction. We instead propose a latent space diffusion energy-based prior to leverage diffusion models, which exhibit high-quality image generation. Crucially, we mitigate their high computational cost and low sample efficiency by learning an energy-based latent representation to perform the diffusion processes. Our resulting method outperforms existing approaches across metrics including Dice and VS scores, and more faithfully captures 3D features.

Energy-Based Prior Latent Space Diffusion model for Reconstruction of Lumbar Vertebrae from Thick Slice MRI

TL;DR

The study addresses reconstructing lumbar vertebrae from thick-slice MRI by introducing lsdebm, a latent-space diffusion energy-based prior that operates on a latent variable and uses a neural-energy to guide diffusion in a compact space. Integrated into a ReconNet-based post-processing pipeline, lsdebm yields higher-fidelity 3D vertebral reconstructions than VAE and latent EBMs, as evidenced by superior Dice, VS, and information-theoretic metrics on paired MRI-CT data, while maintaining practical training and inference times. The method demonstrates stable latent-space diffusion with faster convergence and reduced computational burden compared with full-image diffusion, highlighting a feasible path toward safer MRI-based vertebrae modeling with detailed anatomical fidelity. Overall, the work provides a practical, data-driven prior for 3D medical image reconstruction that improves clinical realism and potential treatment planning using MRI alone.

Abstract

Lumbar spine problems are ubiquitous, motivating research into targeted imaging for treatment planning and guided interventions. While high resolution and high contrast CT has been the modality of choice, MRI can capture both bone and soft tissue without the ionizing radiation of CT albeit longer acquisition time. The critical trade-off between contrast quality and acquisition time has motivated 'thick slice MRI', which prioritises faster imaging with high in-plane resolution but variable contrast and low through-plane resolution. We investigate a recently developed post-acquisition pipeline which segments vertebrae from thick-slice acquisitions and uses a variational autoencoder to enhance quality after an initial 3D reconstruction. We instead propose a latent space diffusion energy-based prior to leverage diffusion models, which exhibit high-quality image generation. Crucially, we mitigate their high computational cost and low sample efficiency by learning an energy-based latent representation to perform the diffusion processes. Our resulting method outperforms existing approaches across metrics including Dice and VS scores, and more faithfully captures 3D features.

Paper Structure

This paper contains 23 sections, 24 equations, 17 figures, 3 tables, 3 algorithms.

Figures (17)

  • Figure 1: Schematic diagram of the segmentation and reconstruction of high-quality lumbar vertebrae mri images, with the proposed pipeline of turella2021high shown in italics. We focus on the generative method of the post-processing step marked in red.
  • Figure 2: The schematic diagram of our network structure and proposed lsdebm. The input is encoded into the latent space $\mathbf{z}$, where a forward diffusion process is constructed and a reverse process with a conditional energy-prior is learned. $\mathbf{z}_0$ is then decoded back into the image dimensions.
  • Figure 4: The visualization of vae, lebm, and lsdebm on the reconstruction results of low-quality mri with reference to the high-quality ct image on the right. For the lebm, and lsdebm the intermediate reconstructions from the latent space at 2, 15, and 20 time steps are also shown. The red boxes denote regions of interest for qualitative comparison.
  • Figure 5: The mean variance of the latent variables (left) of the mcmc sampling process in lebm and (right) the diffusion and denoising processes in lsdebm. The different shades represent repetitions. The arrows denote the time direction of the respective process.
  • Figure 6: Schematics of different existing methods, with focus on the processes increasing or decreasing the noise in the data and its quality. In order left to right, ebm, lebm, diffusion model, diffusion ebm and ours. The blue and orange arrows indicate the forward and backward processes respectively, in constant dimension. The black arrows indicate an encoder or decoder depending on their location, and Greek letters indicate a parameter space. $p^{E}$ and $p^{CE}$ mark processes based on ebm and conditional ebm respectively.
  • ...and 12 more figures