SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder

Ronen Kamenetsky; Sara Dorfman; Daniel Garibi; Roni Paiss; Or Patashnik; Daniel Cohen-Or

SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder

Ronen Kamenetsky, Sara Dorfman, Daniel Garibi, Roni Paiss, Or Patashnik, Daniel Cohen-Or

TL;DR

SAEdit addresses the challenge of disentangled and continuous image editing by performing token-level manipulation of text embeddings through a Sparse Autoencoder (SAE). It derives sparse, attribute-specific edit directions by comparing the sparse representations of source and target prompts, then applies these directions to individual tokens with a controllable scale factor $\omega$ while keeping the diffusion renderer unchanged. The method is model-agnostic, enabling application across backbones like Flux and Stable Diffusion, and introduces an exponential injection schedule $\omega_t = \min\left(e^{t \cdot \omega} - 1, \tau\right)$ to preserve global structure during editing. Extensive experiments, including quantitative benchmarks and real-image editing via inversion, demonstrate strong identity preservation, high prompt fidelity, and robust, continuous control across diverse attributes and domains.

Abstract

Large-scale text-to-image diffusion models have become the backbone of modern image editing, yet text prompts alone do not offer adequate control over the editing process. Two properties are especially desirable: disentanglement, where changing one attribute does not unintentionally alter others, and continuous control, where the strength of an edit can be smoothly adjusted. We introduce a method for disentangled and continuous editing through token-level manipulation of text embeddings. The edits are applied by manipulating the embeddings along carefully chosen directions, which control the strength of the target attribute. To identify such directions, we employ a Sparse Autoencoder (SAE), whose sparse latent space exposes semantically isolated dimensions. Our method operates directly on text embeddings without modifying the diffusion process, making it model agnostic and broadly applicable to various image synthesis backbones. Experiments show that it enables intuitive and efficient manipulations with continuous control across diverse attributes and domains.

SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder

TL;DR

Abstract

SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (22)