Editing Implicit Assumptions in Text-to-Image Diffusion Models

Hadas Orgad; Bahjat Kawar; Yonatan Belinkov

Editing Implicit Assumptions in Text-to-Image Diffusion Models

Hadas Orgad, Bahjat Kawar, Yonatan Belinkov

TL;DR

TIME presents a lightweight, post hoc method to edit implicit knowledge in text-to-image diffusion models by minimally adjusting cross-attention projections to align source prompts with destination attributes. It delivers a closed-form, parallelizable solution that edits only a small fraction of parameters, validated on Stable Diffusion with the TIMED benchmark. Beyond general editing, the approach is demonstrated for gender-bias mitigation in professions, showing improved representation while maintaining image quality. The work introduces TIMED and positions TIME as a foundational step toward controllable, bias-aware diffusion models without full retraining.

Abstract

Text-to-image diffusion models often make implicit assumptions about the world when generating images. While some assumptions are useful (e.g., the sky is blue), they can also be outdated, incorrect, or reflective of social biases present in the training data. Thus, there is a need to control these assumptions without requiring explicit user input or costly re-training. In this work, we aim to edit a given implicit assumption in a pre-trained diffusion model. Our Text-to-Image Model Editing method, TIME for short, receives a pair of inputs: a "source" under-specified prompt for which the model makes an implicit assumption (e.g., "a pack of roses"), and a "destination" prompt that describes the same setting, but with a specified desired attribute (e.g., "a pack of blue roses"). TIME then updates the model's cross-attention layers, as these layers assign visual meaning to textual tokens. We edit the projection matrices in these layers such that the source prompt is projected close to the destination prompt. Our method is highly efficient, as it modifies a mere 2.2% of the model's parameters in under one second. To evaluate model editing approaches, we introduce TIMED (TIME Dataset), containing 147 source and destination prompt pairs from various domains. Our experiments (using Stable Diffusion) show that TIME is successful in model editing, generalizes well for related prompts unseen during editing, and imposes minimal effect on unrelated generations.

Editing Implicit Assumptions in Text-to-Image Diffusion Models

TL;DR

Abstract

Paper Structure (28 sections, 8 equations, 12 figures, 7 tables)

This paper contains 28 sections, 8 equations, 12 figures, 7 tables.

Introduction
Related Work
Background
TIME: Text-to-Image Model Editing
Experiments
Implementation Details
TIME Dataset
Qualitative Evaluation
Evaluation Metrics
Quantitative Evaluation
TIME for Gender Bias Mitigation
Data Preparation
Method Description
Gender Bias Estimation
Results
...and 13 more sections

Figures (12)

Figure 1: TIME edits implicit assumptions in a model (e.g., roses are red). As a result, related prompts (green) change their behavior, while unrelated ones (gray) do not. For example, after model editing, the roses in "A field of roses" become blue.
Figure 2: Text-to-image models make implicit assumptions on the world when generating images, as seen in the top row (e.g., roses are red). In the bottom row, we override these assumptions by explicitly specifying different attributes in the prompt.
Figure 3: A cross-attention layer in a text-to-image diffusion model. We target the strictly text-based layers and the information they encode (highlighted in red).
Figure 4: An overview of TIME. ${\mathbf{W}'}_K$ and ${\mathbf{W}'}_V$ are edited to map the source prompt's embeddings close to the destination prompt's keys and values. The loss is regularized for specificity.
Figure 5: Using TIME, image generations for the source prompt mimic the the destination prompt's oracle behavior.
...and 7 more figures

Editing Implicit Assumptions in Text-to-Image Diffusion Models

TL;DR

Abstract

Editing Implicit Assumptions in Text-to-Image Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (12)