Table of Contents
Fetching ...

Prototype-Guided Diffusion: Visual Conditioning without External Memory

Bilal Faye, Hanane Azzag, Mustapha Lebbah

TL;DR

This paper tackles the inefficiency and rigidity of retrieval-based conditioning in diffusion models by proposing Prototype Diffusion Model (PDM), which learns and updates visual prototypes online within the denoising process. Prototypes are derived unsupervisedly from clean features using contrastive objectives and are aligned with noisy representations to guide generation, eliminating external memory. A supervised variant, s-PDM, fixes class prototypes when labels are available, further boosting semantic fidelity. Empirical results across multiple datasets show that PDM and s-PDM outperform DDPM and ProtoDiffusion in FID/KID, demonstrating scalable, memory-free conditioning with strong semantic grounding.

Abstract

Diffusion models achieve state-of-the-art image generation but remain computationally costly due to iterative denoising. Latent-space models like Stable Diffusion reduce overhead yet lose fine detail, while retrieval-augmented methods improve efficiency but rely on large memory banks, static similarity models, and rigid infrastructures. We introduce the Prototype Diffusion Model (PDM), which embeds prototype learning into the diffusion process to provide adaptive, memory-free conditioning. Instead of retrieving references, PDM learns compact visual prototypes from clean features via contrastive learning, then aligns noisy representations with semantically relevant patterns during denoising. Experiments demonstrate that PDM sustains high generation quality while lowering computational and storage costs, offering a scalable alternative to retrieval-based conditioning.

Prototype-Guided Diffusion: Visual Conditioning without External Memory

TL;DR

This paper tackles the inefficiency and rigidity of retrieval-based conditioning in diffusion models by proposing Prototype Diffusion Model (PDM), which learns and updates visual prototypes online within the denoising process. Prototypes are derived unsupervisedly from clean features using contrastive objectives and are aligned with noisy representations to guide generation, eliminating external memory. A supervised variant, s-PDM, fixes class prototypes when labels are available, further boosting semantic fidelity. Empirical results across multiple datasets show that PDM and s-PDM outperform DDPM and ProtoDiffusion in FID/KID, demonstrating scalable, memory-free conditioning with strong semantic grounding.

Abstract

Diffusion models achieve state-of-the-art image generation but remain computationally costly due to iterative denoising. Latent-space models like Stable Diffusion reduce overhead yet lose fine detail, while retrieval-augmented methods improve efficiency but rely on large memory banks, static similarity models, and rigid infrastructures. We introduce the Prototype Diffusion Model (PDM), which embeds prototype learning into the diffusion process to provide adaptive, memory-free conditioning. Instead of retrieving references, PDM learns compact visual prototypes from clean features via contrastive learning, then aligns noisy representations with semantically relevant patterns during denoising. Experiments demonstrate that PDM sustains high generation quality while lowering computational and storage costs, offering a scalable alternative to retrieval-based conditioning.

Paper Structure

This paper contains 16 sections, 22 equations, 2 figures, 2 tables, 2 algorithms.

Figures (2)

  • Figure 1: PCA of $f_\phi$ features on CIFAR-10. s-PDM shows clearer class separation, while PDM forms semantic clusters without labels.
  • Figure 2: Random $2\times2$ sample generations from models trained on STL-10 after 500 training epochs. Baseline models (DDPM and ProtoDiffusion) produce plausible but less semantically coherent images. Prototype-guided methods (PDM and s-PDM) demonstrate stronger semantic structure, with s-PDM further improving class consistency and visual fidelity. This highlights the impact of jointly learning prototypes on guiding the diffusion process.