Diffusion Models and Representation Learning: A Survey
Michael Fuest, Pingchuan Ma, Ming Gui, Johannes Schusterbauer, Vincent Tao Hu, Bjorn Ommer
TL;DR
This survey maps the evolving nexus between diffusion models and representation learning, clarifying how diffusion denoising fosters semantic representations and how representations can guide diffusion in a self-supervised manner. It introduces a taxonomy and generalized frameworks for extracting diffusion-based features, transferring them to downstream tasks, and for jointly training or augmenting diffusion models with discriminative objectives. The work highlights methods using intermediate activations, knowledge distillation, latent reconstructions, joint modeling, and generative augmentation, while also detailing assignment-based and representation-based guidance strategies. It also discusses key challenges, such as computational demands and interpretability, and outlines future directions like architecture innovations and flow-matching paradigms to advance diffusion-based representation learning.
Abstract
Diffusion Models are popular generative modeling methods in various vision tasks, attracting significant attention. They can be considered a unique instance of self-supervised learning methods due to their independence from label annotation. This survey explores the interplay between diffusion models and representation learning. It provides an overview of diffusion models' essential aspects, including mathematical foundations, popular denoising network architectures, and guidance methods. Various approaches related to diffusion models and representation learning are detailed. These include frameworks that leverage representations learned from pre-trained diffusion models for subsequent recognition tasks and methods that utilize advancements in representation and self-supervised learning to enhance diffusion models. This survey aims to offer a comprehensive overview of the taxonomy between diffusion models and representation learning, identifying key areas of existing concerns and potential exploration. Github link: https://github.com/dongzhuoyao/Diffusion-Representation-Learning-Survey-Taxonomy
