Table of Contents
Fetching ...

Interpretable Diffusion via Information Decomposition

Xianghao Kong, Ollie Liu, Han Li, Dani Yogatama, Greg Ver Steeg

TL;DR

This work reframes denoising diffusion models as information channels and derives exact and pointwise mutual information expressions from the MMSE denoiser, enabling per-image and per-pixel analyses of word–image relations. It introduces a natural, non-negative pixel-wise information decomposition that assigns per-coordinate contributions to words, facilitating compositional understanding, unsupervised object localization, and intervention analysis. Through experiments on COCO-based datasets and the ARO benchmark, the method shows that conditional mutual information (CMI) better predicts image changes under prompt interventions than attention, while MI captures nuances for abstract words. The approach yields architecture-agnostic, data-driven interpretable maps and suggests new evaluation metrics for diffusion-based perception and editing, with potential applications in biology and mechanistic interpretability. All estimations rely on pre-trained diffusion models and importance-sampled integrals over the diffusion parameter $\alpha$.

Abstract

Denoising diffusion models enable conditional generation and density modeling of complex relationships like images and text. However, the nature of the learned relationships is opaque making it difficult to understand precisely what relationships between words and parts of an image are captured, or to predict the effect of an intervention. We illuminate the fine-grained relationships learned by diffusion models by noticing a precise relationship between diffusion and information decomposition. Exact expressions for mutual information and conditional mutual information can be written in terms of the denoising model. Furthermore, pointwise estimates can be easily estimated as well, allowing us to ask questions about the relationships between specific images and captions. Decomposing information even further to understand which variables in a high-dimensional space carry information is a long-standing problem. For diffusion models, we show that a natural non-negative decomposition of mutual information emerges, allowing us to quantify informative relationships between words and pixels in an image. We exploit these new relations to measure the compositional understanding of diffusion models, to do unsupervised localization of objects in images, and to measure effects when selectively editing images through prompt interventions.

Interpretable Diffusion via Information Decomposition

TL;DR

This work reframes denoising diffusion models as information channels and derives exact and pointwise mutual information expressions from the MMSE denoiser, enabling per-image and per-pixel analyses of word–image relations. It introduces a natural, non-negative pixel-wise information decomposition that assigns per-coordinate contributions to words, facilitating compositional understanding, unsupervised object localization, and intervention analysis. Through experiments on COCO-based datasets and the ARO benchmark, the method shows that conditional mutual information (CMI) better predicts image changes under prompt interventions than attention, while MI captures nuances for abstract words. The approach yields architecture-agnostic, data-driven interpretable maps and suggests new evaluation metrics for diffusion-based perception and editing, with potential applications in biology and mechanistic interpretability. All estimations rely on pre-trained diffusion models and importance-sampled integrals over the diffusion parameter .

Abstract

Denoising diffusion models enable conditional generation and density modeling of complex relationships like images and text. However, the nature of the learned relationships is opaque making it difficult to understand precisely what relationships between words and parts of an image are captured, or to predict the effect of an intervention. We illuminate the fine-grained relationships learned by diffusion models by noticing a precise relationship between diffusion and information decomposition. Exact expressions for mutual information and conditional mutual information can be written in terms of the denoising model. Furthermore, pointwise estimates can be easily estimated as well, allowing us to ask questions about the relationships between specific images and captions. Decomposing information even further to understand which variables in a high-dimensional space carry information is a long-standing problem. For diffusion models, we show that a natural non-negative decomposition of mutual information emerges, allowing us to quantify informative relationships between words and pixels in an image. We exploit these new relations to measure the compositional understanding of diffusion models, to do unsupervised localization of objects in images, and to measure effects when selectively editing images through prompt interventions.
Paper Structure (28 sections, 22 equations, 18 figures, 10 tables)

This paper contains 28 sections, 22 equations, 18 figures, 10 tables.

Figures (18)

  • Figure 1: We start (left) with a real image from the COCO dataset. We do a "prompt intervention" (§\ref{['sec:intervention']}) to generate a new image. Next we show conditional mutual information, illustrated using our pixel-wise decomposition, and attention maps for the modified word. Top row shows an image where prompt intervention has an effect, while in the bottom row it has little effect. Conditional mutual information reflects the effect of intervention while attention does not.
  • Figure 2: Examples of localizing different types of words in images. The left half presents noun words, while the right half displays abstract words.
  • Figure 4: Pearson Correlation with Image Change
  • Figure 4: Scatter plot for correlation between MI and CMI.
  • Figure 5: MMSE curves examples for 10 categories.
  • ...and 13 more figures