Stable Flow: Vital Layers for Training-Free Image Editing

Omri Avrahami; Or Patashnik; Ohad Fried; Egor Nemchinov; Kfir Aberman; Dani Lischinski; Daniel Cohen-Or

Stable Flow: Vital Layers for Training-Free Image Editing

Omri Avrahami, Or Patashnik, Ohad Fried, Egor Nemchinov, Kfir Aberman, Dani Lischinski, Daniel Cohen-Or

TL;DR

This work tackles training-free image editing with Diffusion Transformer (DiT) models by automatically identifying a set of vital layers whose features are crucial for image formation. It introduces an attention-injection mechanism that leverages these vital layers to achieve stable, prompt-consistent edits across a range of tasks, including non-rigid deformations and object manipulation. To extend editing to real images, it couples a novel latent nudging technique with inverse Euler ODE-based inversion for better reconstruction and controlled edits. Extensive qualitative, quantitative, and user studies demonstrate the approach's effectiveness and versatility, with additional demonstrations on real-image editing and potential implications for model pruning and distillation. The work thus provides a training-free, layer-focused pathway for reliable image editing using DiT-based diffusion models.

Abstract

Diffusion models have revolutionized the field of content synthesis and editing. Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT), and employed flow-matching for improved training and sampling. However, they exhibit limited generation diversity. In this work, we leverage this limitation to perform consistent image edits via selective injection of attention features. The main challenge is that, unlike the UNet-based models, DiT lacks a coarse-to-fine synthesis structure, making it unclear in which layers to perform the injection. Therefore, we propose an automatic method to identify "vital layers" within DiT, crucial for image formation, and demonstrate how these layers facilitate a range of controlled stable edits, from non-rigid modifications to object addition, using the same mechanism. Next, to enable real-image editing, we introduce an improved image inversion method for flow models. Finally, we evaluate our approach through qualitative and quantitative comparisons, along with a user study, and demonstrate its effectiveness across multiple applications. The project page is available at https://omriavrahami.com/stable-flow

Stable Flow: Vital Layers for Training-Free Image Editing

TL;DR

Abstract

Stable Flow: Vital Layers for Training-Free Image Editing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (35)