Table of Contents
Fetching ...

AutoEdit: Automatic Hyperparameter Tuning for Image Editing

Chau Pham, Quan Dao, Mahesh Bhosale, Yunjie Tian, Dimitris Metaxas, David Doermann

TL;DR

AutoEdit reframes hyperparameter tuning for diffusion-based image editing as a reinforcement learning problem, treating each denoising step as a state and hyperparameters as time-varying actions. By employing a two-phase PPO-based policy—Phase 1 warm-start with priors and Phase 2 online optimization—it achieves near-optimal hyperparameters along a single trajectory, reducing the traditional $\mathcal{O}(TN^K)$ search to $\mathcal{O}(T)$. The method integrates editing objectives into a reward that balances prompt alignment and background preservation, with flexible reward choices (CLIP or LVLM) to handle both global and localized edits. Empirical results across multiple editing methods and base models show consistent gains in background fidelity and semantic fidelity with minimal inference overhead, enabling practical deployment of diffusion-based editing. These findings indicate that per-image, learned hyperparameter control can substantially reduce manual tuning while preserving high-quality edits.

Abstract

Recent advances in diffusion models have revolutionized text-guided image editing, yet existing editing methods face critical challenges in hyperparameter identification. To get the reasonable editing performance, these methods often require the user to brute-force tune multiple interdependent hyperparameters, such as inversion timesteps and attention modification. This process incurs high computational costs due to the huge hyperparameter search space. We consider searching optimal editing's hyperparameters as a sequential decision-making task within the diffusion denoising process. Specifically, we propose a reinforcement learning framework, which establishes a Markov Decision Process that dynamically adjusts hyperparameters across denoising steps, integrating editing objectives into a reward function. The method achieves time efficiency through proximal policy optimization while maintaining optimal hyperparameter configurations. Experiments demonstrate significant reduction in search time and computational overhead compared to existing brute-force approaches, advancing the practical deployment of a diffusion-based image editing framework in the real world. Codes can be found at https://github.com/chaupham1709/AutoEdit.git.

AutoEdit: Automatic Hyperparameter Tuning for Image Editing

TL;DR

AutoEdit reframes hyperparameter tuning for diffusion-based image editing as a reinforcement learning problem, treating each denoising step as a state and hyperparameters as time-varying actions. By employing a two-phase PPO-based policy—Phase 1 warm-start with priors and Phase 2 online optimization—it achieves near-optimal hyperparameters along a single trajectory, reducing the traditional search to . The method integrates editing objectives into a reward that balances prompt alignment and background preservation, with flexible reward choices (CLIP or LVLM) to handle both global and localized edits. Empirical results across multiple editing methods and base models show consistent gains in background fidelity and semantic fidelity with minimal inference overhead, enabling practical deployment of diffusion-based editing. These findings indicate that per-image, learned hyperparameter control can substantially reduce manual tuning while preserving high-quality edits.

Abstract

Recent advances in diffusion models have revolutionized text-guided image editing, yet existing editing methods face critical challenges in hyperparameter identification. To get the reasonable editing performance, these methods often require the user to brute-force tune multiple interdependent hyperparameters, such as inversion timesteps and attention modification. This process incurs high computational costs due to the huge hyperparameter search space. We consider searching optimal editing's hyperparameters as a sequential decision-making task within the diffusion denoising process. Specifically, we propose a reinforcement learning framework, which establishes a Markov Decision Process that dynamically adjusts hyperparameters across denoising steps, integrating editing objectives into a reward function. The method achieves time efficiency through proximal policy optimization while maintaining optimal hyperparameter configurations. Experiments demonstrate significant reduction in search time and computational overhead compared to existing brute-force approaches, advancing the practical deployment of a diffusion-based image editing framework in the real world. Codes can be found at https://github.com/chaupham1709/AutoEdit.git.

Paper Structure

This paper contains 25 sections, 11 equations, 15 figures, 7 tables, 2 algorithms.

Figures (15)

  • Figure 1: Optimal hyperparameters vary significantly across images: The cat image achieves best editing at step $30$ while the bird requires step $40$ in timestep experiments, with similar variance observed in P2P hertz2023prompt cross-attention ratios. Our AutoEdit automatically identifies near-optimal configurations across these parameters, matching manual search performance (last column).
  • Figure 2: Top: Overview of the proposed AutoEdit framework. A policy model is injected to predict the step-wise hyperparameter $\mathcal{H}_t$ at each denoising step $t$. The predicted $\mathcal{H}_t$ is used with the one-step denoising function $g$ and current state $x_t$ to estimate $x_{t-1}$. Bottom: Architecture of the policy model. Features from the U-Net encoder under the original and edited prompts are extracted and concatenated, followed by several trainable layers to predict the policy output.
  • Figure 3: We compare the qualitative results of AutoEdit with the default hyperparameter choice of the baseline. Overall, AutoEdit can search for better hyperparameters, resulting in better object editing, background preservation, and more natural images.
  • Figure 4: The analysis of a) inversion timestep, b) cross-attention replacement, and c) the reward during the training of the policy model $A_{\theta}$.
  • Figure 5: Edited image with different value of $\beta$
  • ...and 10 more figures