Diffusion Bridge AutoEncoders for Unsupervised Representation Learning
Yeongmin Kim, Kwanghyeon Lee, Minsang Park, Byeonghu Na, Il-Chul Moon
TL;DR
This work tackles unsupervised representation learning with diffusion models by addressing the information-split problem that arises when an auxiliary encoder and a fixed diffusion endpoint both carry information about the data. It introduces Diffusion Bridge AutoEncoders (DBAE), which impose a ${\mathbf{z}}$-dependent endpoint ${\mathbf{x}}_T$ through a forward SDE augmented by Doob's $h$-transform, making ${\mathbf{z}}$ an information bottleneck. The authors derive an entropy-regularized score-matching objective that jointly optimizes reconstruction and a learnable generative prior, with theoretical guarantees linking the objective to mutual information and KL bounds. Empirically, DBAE improves downstream inference, reconstruction fidelity, disentanglement, and unconditional generation compared to prior diffusion-based methods, while enabling efficient interpolation and attribute manipulation. This approach advances learnable diffusion representations and provides a solid foundation for downstream tasks requiring informative, compact latent variables.
Abstract
Diffusion-based representation learning has achieved substantial attention due to its promising capabilities in latent representation and sample generation. Recent studies have employed an auxiliary encoder to identify a corresponding representation from a sample and to adjust the dimensionality of a latent variable z. Meanwhile, this auxiliary structure invokes information split problem because the diffusion and the auxiliary encoder would divide the information from the sample into two representations for each model. Particularly, the information modeled by the diffusion becomes over-regularized because of the static prior distribution on xT. To address this problem, we introduce Diffusion Bridge AuteEncoders (DBAE), which enable z-dependent endpoint xT inference through a feed-forward architecture. This structure creates an information bottleneck at z, so xT becomes dependent on z in its generation. This results in two consequences: 1) z holds the full information of samples, and 2) xT becomes a learnable distribution, not static any further. We propose an objective function for DBAE to enable both reconstruction and generative modeling, with their theoretical justification. Empirical evidence supports the effectiveness of the intended design in DBAE, which notably enhances downstream inference quality, reconstruction, and disentanglement. Additionally, DBAE generates high-fidelity samples in the unconditional generation. Our code is available at https://github.com/aailab-kaist/DBAE.
