Table of Contents
Fetching ...

Temporal Concept Dynamics in Diffusion Models via Prompt-Conditioned Interventions

Ada Gorgun, Fawaz Sammani, Nikos Deligiannis, Bernt Schiele, Jonas Fischer

TL;DR

This work introduces Prompt-Conditioned Interventions (PCI) and Concept Insertion Success (CIS) to quantify how semantic concepts emerge and stabilize along diffusion trajectories in a training-free, model-agnostic manner. By switching prompts at varying timesteps and evaluating concept presence with LVLM-based VQA, the authors reveal consistent temporal patterns across architectures, show context-dependent insertability, and demonstrate actionable editing windows. The approach enables cross-model and fine-grained analyses of concept dynamics, and yields a CIS-guided editing method that outperforms strong baselines in preserving content while enforcing semantic changes. Overall, PCI/CIS provide a practical framework for temporally aware evaluation and editing of diffusion-based generative systems.

Abstract

Diffusion models are usually evaluated by their final outputs, gradually denoising random noise into meaningful images. Yet, generation unfolds along a trajectory, and analyzing this dynamic process is crucial for understanding how controllable, reliable, and predictable these models are in terms of their success/failure modes. In this work, we ask the question: when does noise turn into a specific concept (e.g., age) and lock in the denoising trajectory? We propose PCI (Prompt-Conditioned Intervention) to study this question. PCI is a training-free and model-agnostic framework for analyzing concept dynamics through diffusion time. The central idea is the analysis of Concept Insertion Success (CIS), defined as the probability that a concept inserted at a given timestep is preserved and reflected in the final image, offering a way to characterize the temporal dynamics of concept formation. Applied to several state-of-the-art text-to-image diffusion models and a broad taxonomy of concepts, PCI reveals diverse temporal behaviors across diffusion models, in which certain phases of the trajectory are more favorable to specific concepts even within the same concept type. These findings also provide actionable insights for text-driven image editing, highlighting when interventions are most effective without requiring access to model internals or training, and yielding quantitatively stronger edits that achieve a balance of semantic accuracy and content preservation than strong baselines. Code is available at: https://github.com/adagorgun/PCI-Prompt-Controlled-Interventions

Temporal Concept Dynamics in Diffusion Models via Prompt-Conditioned Interventions

TL;DR

This work introduces Prompt-Conditioned Interventions (PCI) and Concept Insertion Success (CIS) to quantify how semantic concepts emerge and stabilize along diffusion trajectories in a training-free, model-agnostic manner. By switching prompts at varying timesteps and evaluating concept presence with LVLM-based VQA, the authors reveal consistent temporal patterns across architectures, show context-dependent insertability, and demonstrate actionable editing windows. The approach enables cross-model and fine-grained analyses of concept dynamics, and yields a CIS-guided editing method that outperforms strong baselines in preserving content while enforcing semantic changes. Overall, PCI/CIS provide a practical framework for temporally aware evaluation and editing of diffusion-based generative systems.

Abstract

Diffusion models are usually evaluated by their final outputs, gradually denoising random noise into meaningful images. Yet, generation unfolds along a trajectory, and analyzing this dynamic process is crucial for understanding how controllable, reliable, and predictable these models are in terms of their success/failure modes. In this work, we ask the question: when does noise turn into a specific concept (e.g., age) and lock in the denoising trajectory? We propose PCI (Prompt-Conditioned Intervention) to study this question. PCI is a training-free and model-agnostic framework for analyzing concept dynamics through diffusion time. The central idea is the analysis of Concept Insertion Success (CIS), defined as the probability that a concept inserted at a given timestep is preserved and reflected in the final image, offering a way to characterize the temporal dynamics of concept formation. Applied to several state-of-the-art text-to-image diffusion models and a broad taxonomy of concepts, PCI reveals diverse temporal behaviors across diffusion models, in which certain phases of the trajectory are more favorable to specific concepts even within the same concept type. These findings also provide actionable insights for text-driven image editing, highlighting when interventions are most effective without requiring access to model internals or training, and yielding quantitatively stronger edits that achieve a balance of semantic accuracy and content preservation than strong baselines. Code is available at: https://github.com/adagorgun/PCI-Prompt-Controlled-Interventions

Paper Structure

This paper contains 34 sections, 9 equations, 18 figures, 3 tables.

Figures (18)

  • Figure 1: Prompt-conditioned intervention (PCI) over diffusion timesteps. We suggest to study when noise turns into a specific concept through the lens of concept insertion success, i.e., the chance of inserting a concept at a certain timestep successfully. By switching the base-prompt (top) at different time points of the diffusion process with a prompt composed of the base prompt and the concept of interest (old), we can measure this success rate (CIS curve) across different seeds and base prompts to analyze temporal dependency and influence of concepts in diffusion models.
  • Figure 2: Overview of the PCI framework. A base prompt $P_b$ is used as conditioning for generation, altered to the concept prompt $P_c$ at time $t_s$. The generated images are evaluated through VQA to determine concept presence and aggregated across seeds to obtain CIS across the diffusion trajectory.
  • Figure 3: CIS reveals cross-concept and cross-architecture differences. CIS for $\bullet$$\tau_{50}$ and $\bullet$$\tau_{70}$ across multiple concept categories and diffusion models.
  • Figure 4: Revealing context-dependent differences for the same concept. We show the difference in CIS curves of a concept provided with different context in the base prompt.
  • Figure 5: Examples of text-driven image editing on SDXL. The edited images are shown at four different points with their respective CIS probabilities: $\tau_{30}$, $\tau_{50}$, $\tau_{70}$, and $\tau_{90}$. High probabilities until a certain point ensure the intended modification but reduce preservation of the original image. We observe that CIS probabilities above 0.7 start to noticeably compromise the original content, and probabilities between 0.5 to 0.7 as suggested by our analysis (red rectangle) are best for editing while preserving the original image.
  • ...and 13 more figures