Table of Contents
Fetching ...

Patronus: Bringing Transparency to Diffusion Models with Prototypes

Nina Weng, Aasa Feragen, Siavash Bigdeli

TL;DR

Patronus inserts a ProtoPNet-inspired prototypical encoder into DDPMs to achieve intrinsic interpretability in diffusion-based image generation. It learns localized visual prototypes and conditions generation via a compact prototype activation vector, enabling visualization, manipulation, and diagnosis of learned semantics without extra annotations. The approach demonstrates that prototypes capture meaningful semantic patterns, supports controllable editing through prototype activations, and aids in identifying unwanted correlations in training data, with competitive generation and latent-quality metrics across multiple datasets. By coupling local semantic prototypes with diffusion guidance, Patronus offers a practical pathway to transparent, controllable, and diagnosable diffusion models with potential applications in bias detection and responsible deployment.

Abstract

Diffusion-based generative models, such as Denoising Diffusion Probabilistic Models (DDPMs), have achieved remarkable success in image generation, but their step-by-step denoising process remains opaque, leaving critical aspects of the generation mechanism unexplained. To address this, we introduce \emph{Patronus}, an interpretable diffusion model inspired by ProtoPNet. Patronus integrates a prototypical network into DDPMs, enabling the extraction of prototypes and conditioning of the generation process on their prototype activation vector. This design enhances interpretability by showing the learned prototypes and how they influence the generation process. Additionally, the model supports downstream tasks like image manipulation, enabling more transparent and controlled modifications. Moreover, Patronus could reveal shortcut learning in the generation process by detecting unwanted correlations between learned prototypes. Notably, Patronus operates entirely without any annotations or text prompts. This work opens new avenues for understanding and controlling diffusion models through prototype-based interpretability. Our code is available at \href{https://github.com/nina-weng/patronus}{https://github.com/nina-weng/patronus}.

Patronus: Bringing Transparency to Diffusion Models with Prototypes

TL;DR

Patronus inserts a ProtoPNet-inspired prototypical encoder into DDPMs to achieve intrinsic interpretability in diffusion-based image generation. It learns localized visual prototypes and conditions generation via a compact prototype activation vector, enabling visualization, manipulation, and diagnosis of learned semantics without extra annotations. The approach demonstrates that prototypes capture meaningful semantic patterns, supports controllable editing through prototype activations, and aids in identifying unwanted correlations in training data, with competitive generation and latent-quality metrics across multiple datasets. By coupling local semantic prototypes with diffusion guidance, Patronus offers a practical pathway to transparent, controllable, and diagnosable diffusion models with potential applications in bias detection and responsible deployment.

Abstract

Diffusion-based generative models, such as Denoising Diffusion Probabilistic Models (DDPMs), have achieved remarkable success in image generation, but their step-by-step denoising process remains opaque, leaving critical aspects of the generation mechanism unexplained. To address this, we introduce \emph{Patronus}, an interpretable diffusion model inspired by ProtoPNet. Patronus integrates a prototypical network into DDPMs, enabling the extraction of prototypes and conditioning of the generation process on their prototype activation vector. This design enhances interpretability by showing the learned prototypes and how they influence the generation process. Additionally, the model supports downstream tasks like image manipulation, enabling more transparent and controlled modifications. Moreover, Patronus could reveal shortcut learning in the generation process by detecting unwanted correlations between learned prototypes. Notably, Patronus operates entirely without any annotations or text prompts. This work opens new avenues for understanding and controlling diffusion models through prototype-based interpretability. Our code is available at \href{https://github.com/nina-weng/patronus}{https://github.com/nina-weng/patronus}.

Paper Structure

This paper contains 60 sections, 1 theorem, 16 equations, 26 figures, 6 tables.

Key Result

Proposition 1

Optimizing the conditioning variable $s$ with denoising loss cannot degrade the quality of the results.

Figures (26)

  • Figure 1: Interpretability, manipulation, and diagnostic capabilities of Patronus. Interpretability: By integrating a prototypical network as the encoder, Patronus learns semantic prototypes ("what") and explains the generative process by revealing where and when they emerge. Manipulation: By adjusting the prototype activation vector, Patronus enables targeted semantic image editing. Diagnosis: Semantic image editing may reveal unwanted correlations learned from the training data during generation. Note that Patronus operates without any annotations or text prompts.
  • Figure 2: Overview of Patronus: contains a prototypical network for prototype extraction and a conditional DDPM for generation.
  • Figure 3: Reconstruction and variations with fixed $s$, random $x_T$.
  • Figure 4: Interpolation between two images. First row: CheXpert irvin2019chexpert, from 75-year-old female $w/o$ enlarged heart (left) to 27-year-old male $w/$ enlarged heart (right). Last two rows: CelebA.
  • Figure 5: Visualization of selected prototypes and their semantic interpretations. Here, $x_0$ denotes the original image, $\hat{x}_{0,s'}$ denotes the generated image guided by condition $s'$, where $s'$ is the enhanced prototype activation vector on $j$-th prototype. Red rectangle highlights the most activated patch in $\hat{x}_{0,s'}$, which is considered as the visual representation of the chosen prototype, also shown in the third row. Note that prototype semantics are not pre-annotated but inferred through observation.
  • ...and 21 more figures

Theorems & Definitions (2)

  • Proposition
  • proof