Table of Contents
Fetching ...

Optical Diffusion Models for Image Generation

Ilker Oguz, Niyazi Ulas Dinc, Mustafa Yildirim, Junjie Ke, Innfarn Yoo, Qifei Wang, Feng Yang, Christophe Moser, Demetri Psaltis

TL;DR

This study demonstrates that the propagation of a light beam through a semi-transparent medium can be programmed to implement a denoising diffusion model on image samples, and enables high-speed image generation with minimal power consumption, benefiting from the bandwidth and energy efficiency of optical information processing.

Abstract

Diffusion models generate new samples by progressively decreasing the noise from the initially provided random distribution. This inference procedure generally utilizes a trained neural network numerous times to obtain the final output, creating significant latency and energy consumption on digital electronic hardware such as GPUs. In this study, we demonstrate that the propagation of a light beam through a semi-transparent medium can be programmed to implement a denoising diffusion model on image samples. This framework projects noisy image patterns through passive diffractive optical layers, which collectively only transmit the predicted noise term in the image. The optical transparent layers, which are trained with an online training approach, backpropagating the error to the analytical model of the system, are passive and kept the same across different steps of denoising. Hence this method enables high-speed image generation with minimal power consumption, benefiting from the bandwidth and energy efficiency of optical information processing.

Optical Diffusion Models for Image Generation

TL;DR

This study demonstrates that the propagation of a light beam through a semi-transparent medium can be programmed to implement a denoising diffusion model on image samples, and enables high-speed image generation with minimal power consumption, benefiting from the bandwidth and energy efficiency of optical information processing.

Abstract

Diffusion models generate new samples by progressively decreasing the noise from the initially provided random distribution. This inference procedure generally utilizes a trained neural network numerous times to obtain the final output, creating significant latency and energy consumption on digital electronic hardware such as GPUs. In this study, we demonstrate that the propagation of a light beam through a semi-transparent medium can be programmed to implement a denoising diffusion model on image samples. This framework projects noisy image patterns through passive diffractive optical layers, which collectively only transmit the predicted noise term in the image. The optical transparent layers, which are trained with an online training approach, backpropagating the error to the analytical model of the system, are passive and kept the same across different steps of denoising. Hence this method enables high-speed image generation with minimal power consumption, benefiting from the bandwidth and energy efficiency of optical information processing.
Paper Structure (20 sections, 11 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 11 equations, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: Comparison between conventional and proposed methods of image generation based on diffusion models. The conventional method runs on digital electronics based computing units such as GPUs or TPUs. The proposed method utilizes an optical denoising unit that is formed by passive optical layers. The image to be denoised is sent to the system with a modulator and the output is read out with a detector.
  • Figure 2: The main operation principle of ODU. Consequent modulation and free space propagation events can be represented with multiplication and convolution operations. When the input beam $U_0(x,y)$, which is patterned with noisy input images, $r_t$, is introduced to the ODU, the output intensity pattern $\|U_f(x,y)\|^2$ corresponds to the trained optical system's prediction of the noise component in the input pattern, $\epsilon_\theta(r_t)$.
  • Figure 3: Images generated by the Optical Diffusion Model at different timesteps and when trained with various datasets. The generated images and their corresponding Inception and FID scores are calculated between timesteps $T = 10$ to $T=950$ are acquired after training with the MNIST digits dataset. Final outputs at time $T=1000$, acquired from ODUs trained for the MNIST digits samples have FID = $206.6$, for Fashion MNIST, FID = $227.7$ and for Quick, Draw!, FID = $131.4$
  • Figure 4: Scaling of the denoising capabilities (left) and generation performance (right) of Optical Diffusion, and pure digital convolutional U-Net and fully connected networks with the output image resolution.
  • Figure 5: The dependency of denoising performance (MSE) and generation quality scores(FID, KID and Inception score), on the hyperparameters of the ODUs (number of pixels of optical modulation layers, number of modulation layers and number of denoising layer sets ($M$)).
  • ...and 5 more figures