Table of Contents
Fetching ...

Schödinger Bridge Type Diffusion Models as an Extension of Variational Autoencoders

Kentaro Kaba, Reo Shimizu, Masayuki Ohzeki, Yuki Sughiyama

TL;DR

This work proposes a unified framework to construct diffusion models by reinterpreting the SB-type models as an extension of variational autoencoders and finds that the objective function consists of the prior loss and drift matching parts.

Abstract

Generative diffusion models use time-forward and backward stochastic differential equations to connect the data and prior distributions. While conventional diffusion models (e.g., score-based models) only learn the backward process, more flexible frameworks have been proposed to also learn the forward process by employing the Schrödinger bridge (SB). However, due to the complexity of the mathematical structure behind SB-type models, we can not easily give an intuitive understanding of their objective function. In this work, we propose a unified framework to construct diffusion models by reinterpreting the SB-type models as an extension of variational autoencoders. In this context, the data processing inequality plays a crucial role. As a result, we find that the objective function consists of the prior loss and drift matching parts.

Schödinger Bridge Type Diffusion Models as an Extension of Variational Autoencoders

TL;DR

This work proposes a unified framework to construct diffusion models by reinterpreting the SB-type models as an extension of variational autoencoders and finds that the objective function consists of the prior loss and drift matching parts.

Abstract

Generative diffusion models use time-forward and backward stochastic differential equations to connect the data and prior distributions. While conventional diffusion models (e.g., score-based models) only learn the backward process, more flexible frameworks have been proposed to also learn the forward process by employing the Schrödinger bridge (SB). However, due to the complexity of the mathematical structure behind SB-type models, we can not easily give an intuitive understanding of their objective function. In this work, we propose a unified framework to construct diffusion models by reinterpreting the SB-type models as an extension of variational autoencoders. In this context, the data processing inequality plays a crucial role. As a result, we find that the objective function consists of the prior loss and drift matching parts.

Paper Structure

This paper contains 6 sections, 25 equations, 1 figure.

Figures (1)

  • Figure 1: The top and middle SDEs with the blue shadow denote the encode SDE with NN $u_\phi$ and its time reversal, which connect the data $\mu$ and $p_\phi$. By contrast, the bottom SDE with the red shadow represents the decode SDE with the other NN $s_\theta$, which transports the prior $\pi$ to $q_\theta$. The first term in Eq. \ref{['eq:objective_function_DM_reformed']} means the prior loss between $p_\phi$ and $\pi$ (see blue dotted squares); the second term indicates the drift matching between the reverse-encode and the decode SDEs (see red dotted squares).