Patronus: Bringing Transparency to Diffusion Models with Prototypes
Nina Weng, Aasa Feragen, Siavash Bigdeli
TL;DR
Patronus inserts a ProtoPNet-inspired prototypical encoder into DDPMs to achieve intrinsic interpretability in diffusion-based image generation. It learns localized visual prototypes and conditions generation via a compact prototype activation vector, enabling visualization, manipulation, and diagnosis of learned semantics without extra annotations. The approach demonstrates that prototypes capture meaningful semantic patterns, supports controllable editing through prototype activations, and aids in identifying unwanted correlations in training data, with competitive generation and latent-quality metrics across multiple datasets. By coupling local semantic prototypes with diffusion guidance, Patronus offers a practical pathway to transparent, controllable, and diagnosable diffusion models with potential applications in bias detection and responsible deployment.
Abstract
Diffusion-based generative models, such as Denoising Diffusion Probabilistic Models (DDPMs), have achieved remarkable success in image generation, but their step-by-step denoising process remains opaque, leaving critical aspects of the generation mechanism unexplained. To address this, we introduce \emph{Patronus}, an interpretable diffusion model inspired by ProtoPNet. Patronus integrates a prototypical network into DDPMs, enabling the extraction of prototypes and conditioning of the generation process on their prototype activation vector. This design enhances interpretability by showing the learned prototypes and how they influence the generation process. Additionally, the model supports downstream tasks like image manipulation, enabling more transparent and controlled modifications. Moreover, Patronus could reveal shortcut learning in the generation process by detecting unwanted correlations between learned prototypes. Notably, Patronus operates entirely without any annotations or text prompts. This work opens new avenues for understanding and controlling diffusion models through prototype-based interpretability. Our code is available at \href{https://github.com/nina-weng/patronus}{https://github.com/nina-weng/patronus}.
