Table of Contents
Fetching ...

Volumetric Conditioning Module to Control Pretrained Diffusion Models for 3D Medical Images

Suhyun Ahn, Wonjung Park, Jihoon Cho, Seunghyuck Park, Jinah Park

TL;DR

The paper addresses the challenge of conditioning pretrained diffusion models for 3D medical images under limited data and computational resources. It introduces the Volumetric Conditioning Module (VCM), a lightweight, time-conditioned asymmetric U-Net that attaches to a frozen pretrained diffusion model (BrainLDM) and modulates diffusion priors via per-timestep parameters $\gamma_t$ and $\beta_t$, enabling accurate spatial conditioning. VCM supports single- and multimodal conditioning and demonstrates strong data efficiency and competitive performance in tasks including volumetric data synthesis and axial super-resolution, outperforming several 2D-inspired spatial-control baselines, especially in low-data regimes. The method achieves effective condition alignment with substantially lower memory and training data requirements, suggesting practical utility for data augmentation, translation, and other downstream medical-imaging applications on standard enterprise GPUs.

Abstract

Spatial control methods using additional modules on pretrained diffusion models have gained attention for enabling conditional generation in natural images. These methods guide the generation process with new conditions while leveraging the capabilities of large models. They could be beneficial as training strategies in the context of 3D medical imaging, where training a diffusion model from scratch is challenging due to high computational costs and data scarcity. However, the potential application of spatial control methods with additional modules to 3D medical images has not yet been explored. In this paper, we present a tailored spatial control method for 3D medical images with a novel lightweight module, Volumetric Conditioning Module (VCM). Our VCM employs an asymmetric U-Net architecture to effectively encode complex information from various levels of 3D conditions, providing detailed guidance in image synthesis. To examine the applicability of spatial control methods and the effectiveness of VCM for 3D medical data, we conduct experiments under single- and multimodal conditions scenarios across a wide range of dataset sizes, from extremely small datasets with 10 samples to large datasets with 500 samples. The experimental results show that the VCM is effective for conditional generation and efficient in terms of requiring less training data and computational resources. We further investigate the potential applications for our spatial control method through axial super-resolution for medical images. Our code is available at \url{https://github.com/Ahn-Ssu/VCM}

Volumetric Conditioning Module to Control Pretrained Diffusion Models for 3D Medical Images

TL;DR

The paper addresses the challenge of conditioning pretrained diffusion models for 3D medical images under limited data and computational resources. It introduces the Volumetric Conditioning Module (VCM), a lightweight, time-conditioned asymmetric U-Net that attaches to a frozen pretrained diffusion model (BrainLDM) and modulates diffusion priors via per-timestep parameters and , enabling accurate spatial conditioning. VCM supports single- and multimodal conditioning and demonstrates strong data efficiency and competitive performance in tasks including volumetric data synthesis and axial super-resolution, outperforming several 2D-inspired spatial-control baselines, especially in low-data regimes. The method achieves effective condition alignment with substantially lower memory and training data requirements, suggesting practical utility for data augmentation, translation, and other downstream medical-imaging applications on standard enterprise GPUs.

Abstract

Spatial control methods using additional modules on pretrained diffusion models have gained attention for enabling conditional generation in natural images. These methods guide the generation process with new conditions while leveraging the capabilities of large models. They could be beneficial as training strategies in the context of 3D medical imaging, where training a diffusion model from scratch is challenging due to high computational costs and data scarcity. However, the potential application of spatial control methods with additional modules to 3D medical images has not yet been explored. In this paper, we present a tailored spatial control method for 3D medical images with a novel lightweight module, Volumetric Conditioning Module (VCM). Our VCM employs an asymmetric U-Net architecture to effectively encode complex information from various levels of 3D conditions, providing detailed guidance in image synthesis. To examine the applicability of spatial control methods and the effectiveness of VCM for 3D medical data, we conduct experiments under single- and multimodal conditions scenarios across a wide range of dataset sizes, from extremely small datasets with 10 samples to large datasets with 500 samples. The experimental results show that the VCM is effective for conditional generation and efficient in terms of requiring less training data and computational resources. We further investigate the potential applications for our spatial control method through axial super-resolution for medical images. Our code is available at \url{https://github.com/Ahn-Ssu/VCM}

Paper Structure

This paper contains 23 sections, 4 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: Illustration of tasks using the proposed method, Volumetric Conditioning Module (VCM). Upon a large pretrained diffusion model such as BrainLDMpinaya2022brain_brainLDM, VCM controls spatially fine-grained layouts from various new conditions. Using the generation abilities of diffusion models, VCM can versatilely perform various tasks in medical images such as (b-d) data synthesis with labels, (e) super-resolution, and (f) image translation. We notate the used conditions in blue colored boxes, including 1D scalars input for BrainLDM.
  • Figure 2: Comparison of controlling schemes. ControlNet zhang2023adding_controlNet and T2I-Adpater mou2023t2iadapter employ feature fusion to inject guidance. Meanwhile, MCM ham2023modulating_mcm, remarkably lightweight module, and our VCM adapt a modulation approach.
  • Figure 3: Illustration of the proposed VCM pipeline. VCM is located on the top of a pretrained diffusion model and modulates the model's output by leveraging the diffusion priors and new conditions as inputs. More details in \ref{['supply:impl', 'fig:VCMdetails_modalitydrop']}.
  • Figure 4: Synthetic images from various conditional generation methods trained with 50 training data with the LV mask condition.
  • Figure 5: Synthetic images from various spatial control methods trained with 50 training data with multimodal conditions.
  • ...and 12 more figures