Table of Contents
Fetching ...

ELROND: Exploring and decomposing intrinsic capabilities of diffusion models

Paweł Skierś, Tomasz Trzciński, Kamil Deja

TL;DR

ELROND tackles the opacity of diffusion-model outputs by uncovering intrinsic semantic directions directly in the text-embedding space. By backpropagating the differences between stochastic realizations of the same prompt, and decomposing the resulting gradient set with PCA or Sparse Autoencoders, it yields steerable, token-specific directions for precise control and composition. The method also demonstrates mitigation of mode collapse in distilled models through reintroduction of diversity and introduces a Local Intrinsic Dimensionality (LID) based estimator to quantify concept complexity. The results show token-level composability, improved diversity, and a geometry-grounded framework for unsupervised interpretability in diffusion models. This approach could enhance controllability and analysis of generative models in practical applications.

Abstract

A single text prompt passed to a diffusion model often yields a wide range of visual outputs determined solely by stochastic process, leaving users with no direct control over which specific semantic variations appear in the image. While existing unsupervised methods attempt to analyze these variations via output features, they omit the underlying generative process. In this work, we propose a framework to disentangle these semantic directions directly within the input embedding space. To that end, we collect a set of gradients obtained by backpropagating the differences between stochastic realizations of a fixed prompt that we later decompose into meaningful steering directions with either Principal Components Analysis or Sparse Autoencoder. Our approach yields three key contributions: (1) it isolates interpretable, steerable directions for precise, fine-grained control over a single concept; (2) it effectively mitigates mode collapse in distilled models by reintroducing lost diversity; and (3) it establishes a novel estimator for concept complexity under a specific model, based on the dimensionality of the discovered subspace.

ELROND: Exploring and decomposing intrinsic capabilities of diffusion models

TL;DR

ELROND tackles the opacity of diffusion-model outputs by uncovering intrinsic semantic directions directly in the text-embedding space. By backpropagating the differences between stochastic realizations of the same prompt, and decomposing the resulting gradient set with PCA or Sparse Autoencoders, it yields steerable, token-specific directions for precise control and composition. The method also demonstrates mitigation of mode collapse in distilled models through reintroduction of diversity and introduces a Local Intrinsic Dimensionality (LID) based estimator to quantify concept complexity. The results show token-level composability, improved diversity, and a geometry-grounded framework for unsupervised interpretability in diffusion models. This approach could enhance controllability and analysis of generative models in practical applications.

Abstract

A single text prompt passed to a diffusion model often yields a wide range of visual outputs determined solely by stochastic process, leaving users with no direct control over which specific semantic variations appear in the image. While existing unsupervised methods attempt to analyze these variations via output features, they omit the underlying generative process. In this work, we propose a framework to disentangle these semantic directions directly within the input embedding space. To that end, we collect a set of gradients obtained by backpropagating the differences between stochastic realizations of a fixed prompt that we later decompose into meaningful steering directions with either Principal Components Analysis or Sparse Autoencoder. Our approach yields three key contributions: (1) it isolates interpretable, steerable directions for precise, fine-grained control over a single concept; (2) it effectively mitigates mode collapse in distilled models by reintroducing lost diversity; and (3) it establishes a novel estimator for concept complexity under a specific model, based on the dimensionality of the discovered subspace.
Paper Structure (31 sections, 7 equations, 19 figures, 10 tables)

This paper contains 31 sections, 7 equations, 19 figures, 10 tables.

Figures (19)

  • Figure 1: Unsupervised Discovery of Latent Directions. ELROND is able to extract, in an unsupervised way, latent directions that allow for exploration of the visual capabilities of the diffusion model. We visualize four directions extracted for a concept monster from Flux Schnell flux1schnell_hf.
  • Figure 2: Semantic trajectories within DM conditioning space. Semantics transitions in the diffusion model outputs resulting from linear interpolation (right) do not correspond to a linear direction within CLIP embedding space (left) (first 2 PCA dimensions). (Solid markers denote the final state ($t=1.0$), hollow circles indicate the initialization ($t=0$), curves are smoothed using cubic spline interpolation).
  • Figure 3: To extract the latent directions for a given concept, we backpropagate the differences between image instances generated from the same prompt to the given token. We then disentangle the latent directions by training SAE / computing PCA on the gradients.
  • Figure 4: ELROND decomposes the visual capabilities of the diffusion model. We show latent directions discovered in SDXL-DMD.
  • Figure 5: ELROND generates visually diverse variations of concepts (left) as measured by mean DreamSim distance (higher is better). Despite this, it maintains similar text-to-image alignment (right), as measured by CLIP Scores (lower is better).
  • ...and 14 more figures