Table of Contents
Fetching ...

Leveraging Multimodal Diffusion Models to Accelerate Imaging with Side Information

Timofey Efimov, Harry Dong, Megna Shah, Jeff Simmons, Sean Donegan, Yuejie Chi

TL;DR

This work proposes a framework to train a multimodal diffusion model over the joint modalities, turning inverse problems with black-box forward models into simple linear inpainting problems, and achieves superior image reconstruction by leveraging the available side information, requiring significantly less amount of data from the expensive microscopy modality.

Abstract

Diffusion models have found phenomenal success as expressive priors for solving inverse problems, but their extension beyond natural images to more structured scientific domains remains limited. Motivated by applications in materials science, we aim to reduce the number of measurements required from an expensive imaging modality of interest, by leveraging side information from an auxiliary modality that is much cheaper to obtain. To deal with the non-differentiable and black-box nature of the forward model, we propose a framework to train a multimodal diffusion model over the joint modalities, turning inverse problems with black-box forward models into simple linear inpainting problems. Numerically, we demonstrate the feasibility of training diffusion models over materials imagery data, and show that our approach achieves superior image reconstruction by leveraging the available side information, requiring significantly less amount of data from the expensive microscopy modality.

Leveraging Multimodal Diffusion Models to Accelerate Imaging with Side Information

TL;DR

This work proposes a framework to train a multimodal diffusion model over the joint modalities, turning inverse problems with black-box forward models into simple linear inpainting problems, and achieves superior image reconstruction by leveraging the available side information, requiring significantly less amount of data from the expensive microscopy modality.

Abstract

Diffusion models have found phenomenal success as expressive priors for solving inverse problems, but their extension beyond natural images to more structured scientific domains remains limited. Motivated by applications in materials science, we aim to reduce the number of measurements required from an expensive imaging modality of interest, by leveraging side information from an auxiliary modality that is much cheaper to obtain. To deal with the non-differentiable and black-box nature of the forward model, we propose a framework to train a multimodal diffusion model over the joint modalities, turning inverse problems with black-box forward models into simple linear inpainting problems. Numerically, we demonstrate the feasibility of training diffusion models over materials imagery data, and show that our approach achieves superior image reconstruction by leveraging the available side information, requiring significantly less amount of data from the expensive microscopy modality.
Paper Structure (9 sections, 9 equations, 6 figures)

This paper contains 9 sections, 9 equations, 6 figures.

Figures (6)

  • Figure 1: Method overview. The observed Modality 2 (e.g., PL) is the result of applying a black box forward model $f$ to Modality 1 (e.g., EBSD), which can only be partially observed. Because of the forward model is unknown, the unimodal model treats this as a inpainting problem and reconstructs Modality 1 poorly. Our multimodal diffusion method concatenates both modalities to reframe the (possibly nonlinear) inverse problem as a linear inpainting problem to produce better results.
  • Figure 2: Examples of generated images from our multimodal diffusion model, where we verify the generated PL data is highly consistent with the PL data by passing through the generated EBSD image through the black-box forward model. From left to right, the images are the generated EBSD, generated PL, PL from applying the forward model on the generated EBSD, and the relative $\ell_2$ consistency error between the PL columns.
  • Figure 3: Comparison of the unimodal and multimodal performance (measured by Euler angle disorientation) with respect to the fraction of observed EBSD entries for both models. We use three varying sizes with increasing hidden dimension for the unimodal UNet with the largest being the same size as the multimodal UNet. Lower disorientation is better.
  • Figure 4: Performance comparison of our multimodal diffusion model for different amounts of PL measurement noise injected while observing 2% noiseless EBSD. Lower disorientation is better. Our multimodal model is robust to noise in PL observations, as performance across varying noise levels are similar to the noiseless case.
  • Figure 5: The uncertainty quantification of the reconstruction error across 20 generated EBSD images from the same observations at different observation levels of EBSD entries, where the full noiseless PL image is observed for all cases.
  • ...and 1 more figures