Table of Contents
Fetching ...

Conditional Variational Diffusion Models

Gabriel della Maggiora, Luis Alberto Croquevielle, Nikita Deshpande, Harry Horsley, Thomas Heinis, Artur Yakimovich

TL;DR

This work introduces Conditional Variational Diffusion Models (CVDM) that learn the diffusion schedule during training and support pixel-wise conditioning for inverse problems. By factorizing the forward dynamics and enforcing a monotone, time-dependent schedule through a separable beta-t(x) and gamma(t,x), the approach achieves strong performance on BioSR super-resolution and quantitative phase imaging, often surpassing fine-tuned diffusion baselines. A regularization term L_gamma stabilizes training, ensuring gradual noise injection and preventing degenerate solutions. Across BioSR, synthetic QPI, and real brightfield QPI, CVDM delivers competitive or superior reconstruction quality with minimal fine-tuning, highlighting its versatility and potential for clinical imaging applications.

Abstract

Inverse problems aim to determine parameters from observations, a crucial task in engineering and science. Lately, generative models, especially diffusion models, have gained popularity in this area for their ability to produce realistic solutions and their good mathematical properties. Despite their success, an important drawback of diffusion models is their sensitivity to the choice of variance schedule, which controls the dynamics of the diffusion process. Fine-tuning this schedule for specific applications is crucial but time-costly and does not guarantee an optimal result. We propose a novel approach for learning the schedule as part of the training process. Our method supports probabilistic conditioning on data, provides high-quality solutions, and is flexible, proving able to adapt to different applications with minimum overhead. This approach is tested in two unrelated inverse problems: super-resolution microscopy and quantitative phase imaging, yielding comparable or superior results to previous methods and fine-tuned diffusion models. We conclude that fine-tuning the schedule by experimentation should be avoided because it can be learned during training in a stable way that yields better results.

Conditional Variational Diffusion Models

TL;DR

This work introduces Conditional Variational Diffusion Models (CVDM) that learn the diffusion schedule during training and support pixel-wise conditioning for inverse problems. By factorizing the forward dynamics and enforcing a monotone, time-dependent schedule through a separable beta-t(x) and gamma(t,x), the approach achieves strong performance on BioSR super-resolution and quantitative phase imaging, often surpassing fine-tuned diffusion baselines. A regularization term L_gamma stabilizes training, ensuring gradual noise injection and preventing degenerate solutions. Across BioSR, synthetic QPI, and real brightfield QPI, CVDM delivers competitive or superior reconstruction quality with minimal fine-tuning, highlighting its versatility and potential for clinical imaging applications.

Abstract

Inverse problems aim to determine parameters from observations, a crucial task in engineering and science. Lately, generative models, especially diffusion models, have gained popularity in this area for their ability to produce realistic solutions and their good mathematical properties. Despite their success, an important drawback of diffusion models is their sensitivity to the choice of variance schedule, which controls the dynamics of the diffusion process. Fine-tuning this schedule for specific applications is crucial but time-costly and does not guarantee an optimal result. We propose a novel approach for learning the schedule as part of the training process. Our method supports probabilistic conditioning on data, provides high-quality solutions, and is flexible, proving able to adapt to different applications with minimum overhead. This approach is tested in two unrelated inverse problems: super-resolution microscopy and quantitative phase imaging, yielding comparable or superior results to previous methods and fine-tuned diffusion models. We conclude that fine-tuning the schedule by experimentation should be avoided because it can be learned during training in a stable way that yields better results.
Paper Structure (36 sections, 74 equations, 17 figures, 4 tables, 2 algorithms)

This paper contains 36 sections, 74 equations, 17 figures, 4 tables, 2 algorithms.

Figures (17)

  • Figure 1: QPI methods evaluated in the synthetic dataset (a) and brightfield clinical microscopy showing epithelial cells (b). From left to right: first column displays the defocused image at distance $d$ ($I_d$), with the respective ground truth (GT) situated directly below. Second, third, and fourth columns represent each a different method, with the reconstruction on top and the error image at the bottom.
  • Figure 2: Schedules and sample mean and deviations for the represented images from the BioSR dataset. (a) Schedule ($\beta$) values for a microtubule image. The graph shows the average of the pixels in the respective region. (b) Mean and standard deviations for microtubule (top row) and endoplasmic reticulum (bottom row). The images were reconstructed using 20 samples obtained with CVDM.
  • Figure 3: Architecture of $\lambda_\phi(\mathbf{x})$.
  • Figure 4: Architecture of the score predictor used in BioSR and QPI.
  • Figure 5: Training of Denoising Model $\mathbf{\hat{{\epsilon}}}_\nu(\mathbf{z}_t(\epsilon), t,\mathbf{x})$
  • ...and 12 more figures