Prototype-Guided Diffusion: Visual Conditioning without External Memory
Bilal Faye, Hanane Azzag, Mustapha Lebbah
TL;DR
This paper tackles the inefficiency and rigidity of retrieval-based conditioning in diffusion models by proposing Prototype Diffusion Model (PDM), which learns and updates visual prototypes online within the denoising process. Prototypes are derived unsupervisedly from clean features using contrastive objectives and are aligned with noisy representations to guide generation, eliminating external memory. A supervised variant, s-PDM, fixes class prototypes when labels are available, further boosting semantic fidelity. Empirical results across multiple datasets show that PDM and s-PDM outperform DDPM and ProtoDiffusion in FID/KID, demonstrating scalable, memory-free conditioning with strong semantic grounding.
Abstract
Diffusion models achieve state-of-the-art image generation but remain computationally costly due to iterative denoising. Latent-space models like Stable Diffusion reduce overhead yet lose fine detail, while retrieval-augmented methods improve efficiency but rely on large memory banks, static similarity models, and rigid infrastructures. We introduce the Prototype Diffusion Model (PDM), which embeds prototype learning into the diffusion process to provide adaptive, memory-free conditioning. Instead of retrieving references, PDM learns compact visual prototypes from clean features via contrastive learning, then aligns noisy representations with semantically relevant patterns during denoising. Experiments demonstrate that PDM sustains high generation quality while lowering computational and storage costs, offering a scalable alternative to retrieval-based conditioning.
