TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch, David Asulin, Aviad Moreshet, Kuo-Chin Lien, Misha Sra, Pradeep Sen
TL;DR
TiNO-Edit tackles the challenge of reliable diffusion-based image editing by optimizing diffusion timesteps and input noise, rather than relying solely on model fine-tuning or prompt manipulation. By operating in Stable Diffusion's latent space and employing LatentCLIP and LatentVGG-based losses, it achieves faster optimization and high-fidelity edits that respect both the original image and the target prompt. The method supports a range of editing styles, including text-guided, reference-guided, stroke-guided, and image composition, and remains compatible with DreamBooth and Textual Inversion concepts. Empirical results demonstrate superior qualitative and quantitative performance across diverse editing tasks, with strong ablations validating the importance of masking, timesteps, and latent-domain losses. This approach offers a practical, scalable workflow for controllable diffusion-based editing with broad applicability in creative and applied contexts.
Abstract
Despite many attempts to leverage pre-trained text-to-image models (T2I) like Stable Diffusion (SD) for controllable image editing, producing good predictable results remains a challenge. Previous approaches have focused on either fine-tuning pre-trained T2I models on specific datasets to generate certain kinds of images (e.g., with a specific object or person), or on optimizing the weights, text prompts, and/or learning features for each input image in an attempt to coax the image generator to produce the desired result. However, these approaches all have shortcomings and fail to produce good results in a predictable and controllable manner. To address this problem, we present TiNO-Edit, an SD-based method that focuses on optimizing the noise patterns and diffusion timesteps during editing, something previously unexplored in the literature. With this simple change, we are able to generate results that both better align with the original images and reflect the desired result. Furthermore, we propose a set of new loss functions that operate in the latent domain of SD, greatly speeding up the optimization when compared to prior approaches, which operate in the pixel domain. Our method can be easily applied to variations of SD including Textual Inversion and DreamBooth that encode new concepts and incorporate them into the edited results. We present a host of image-editing capabilities enabled by our approach. Our code is publicly available at https://github.com/SherryXTChen/TiNO-Edit.
