Table of Contents
Fetching ...

Localizing Knowledge in Diffusion Transformers

Arman Zarei, Samyadeep Basu, Keivan Rezaei, Zihao Lin, Sayan Nag, Soheil Feizi

TL;DR

This work tackles the problem of understanding where knowledge is encoded in Diffusion Transformers (DiTs) and enabling targeted, efficient edits. It introduces a model- and knowledge-agnostic localization method based on attention-contribution signals to identify the top-$K$ blocks that carry specific concepts, evaluating across three DiT architectures (PixArt-$\alpha$, FLUX, SANA) and six knowledge categories. A new LocK (Localization of Knowledge) probe dataset supports large-scale, diverse evaluation. Building on localization, the authors demonstrate practical applications in model personalization and concept unlearning with localized fine-tuning that preserves unrelated content and reduces computational costs, offering a path toward more interpretable and efficient DiT editing.

Abstract

Understanding how knowledge is distributed across the layers of generative models is crucial for improving interpretability, controllability, and adaptation. While prior work has explored knowledge localization in UNet-based architectures, Diffusion Transformer (DiT)-based models remain underexplored in this context. In this paper, we propose a model- and knowledge-agnostic method to localize where specific types of knowledge are encoded within the DiT blocks. We evaluate our method on state-of-the-art DiT-based models, including PixArt-alpha, FLUX, and SANA, across six diverse knowledge categories. We show that the identified blocks are both interpretable and causally linked to the expression of knowledge in generated outputs. Building on these insights, we apply our localization framework to two key applications: model personalization and knowledge unlearning. In both settings, our localized fine-tuning approach enables efficient and targeted updates, reducing computational cost, improving task-specific performance, and better preserving general model behavior with minimal interference to unrelated or surrounding content. Overall, our findings offer new insights into the internal structure of DiTs and introduce a practical pathway for more interpretable, efficient, and controllable model editing.

Localizing Knowledge in Diffusion Transformers

TL;DR

This work tackles the problem of understanding where knowledge is encoded in Diffusion Transformers (DiTs) and enabling targeted, efficient edits. It introduces a model- and knowledge-agnostic localization method based on attention-contribution signals to identify the top- blocks that carry specific concepts, evaluating across three DiT architectures (PixArt-, FLUX, SANA) and six knowledge categories. A new LocK (Localization of Knowledge) probe dataset supports large-scale, diverse evaluation. Building on localization, the authors demonstrate practical applications in model personalization and concept unlearning with localized fine-tuning that preserves unrelated content and reduces computational costs, offering a path toward more interpretable and efficient DiT editing.

Abstract

Understanding how knowledge is distributed across the layers of generative models is crucial for improving interpretability, controllability, and adaptation. While prior work has explored knowledge localization in UNet-based architectures, Diffusion Transformer (DiT)-based models remain underexplored in this context. In this paper, we propose a model- and knowledge-agnostic method to localize where specific types of knowledge are encoded within the DiT blocks. We evaluate our method on state-of-the-art DiT-based models, including PixArt-alpha, FLUX, and SANA, across six diverse knowledge categories. We show that the identified blocks are both interpretable and causally linked to the expression of knowledge in generated outputs. Building on these insights, we apply our localization framework to two key applications: model personalization and knowledge unlearning. In both settings, our localized fine-tuning approach enables efficient and targeted updates, reducing computational cost, improving task-specific performance, and better preserving general model behavior with minimal interference to unrelated or surrounding content. Overall, our findings offer new insights into the internal structure of DiTs and introduce a practical pathway for more interpretable, efficient, and controllable model editing.

Paper Structure

This paper contains 27 sections, 9 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Localization across various DiT models and knowledge categories. For each model, heatmaps indicate the frequency of each block being selected as a dominant carrier of different target knowledge. Green-bordered images are standard generations, while red-bordered images result from withholding knowledge-specific information in the localized blocks. Our method successfully localizes diverse knowledge types, with variation in localization patterns across models.
  • Figure 2: Targeted fine-tuning via knowledge localization. Given a concept to personalize or remove, our method first identifies the most relevant blocks via knowledge localization and restricts fine-tuning to those blocks. This enables efficient adaptation (top) and targeted suppression (bottom) with minimal impact on surrounding content, while better preserving the model’s prior performance.
  • Figure 3: Overview of our knowledge localization method. We first generate images from prompts $\{p_i^\kappa\}$ containing target knowledge $\kappa$, and compute token-level attention contributions across layers. Aggregated scores identify the top-$K$ blocks $\mathcal{B}_K^\kappa$ most responsible for encoding $\kappa$. Replacing their inputs with knowledge-agnostic prompts $\{p_i^{\kappa\text{-neutral}}\}$ suppresses the knowledge in the output.
  • Figure 4: Differences in how knowledge is localized across categories and models. LLaVA-based evaluations and generation samples as the number of intervened blocks $K$ increases, where $K$ denotes the top-$K$ most informative blocks identified by our localization method. Some knowledge types (e.g., copyright) are highly concentrated in a few blocks, while others (e.g., animals) are more distributed across the model. Examples include outputs from the base models and their intervened counterparts.
  • Figure 5: Variation in how artistic styles are localized within the model. We report CSD scores for various artists in the PixArt-$\alpha$ model as the number of intervened blocks $K$ increases. The numbers indicate how many artist styles remain identifiable at each $K$. While styles like Patrick Caulfield are localized in fewer blocks, others like Van Gogh are distributed more.
  • ...and 6 more figures