Table of Contents
Fetching ...

ChordEdit: One-Step Low-Energy Transport for Image Editing

Liangsi Lu, Xuhang Chen, Minzhe Guo, Shichu Li, Jingchao Wang, Yang Shi

TL;DR

ChordEdit is introduced, a model agnostic, training-free, and inversion-free method that facilitates high-fidelity one-step editing on T2I models and recast editing as a transport problem between the source and target distributions defined by the source and target text prompts.

Abstract

The advent of one-step text-to-image (T2I) models offers unprecedented synthesis speed. However, their application to text-guided image editing remains severely hampered, as forcing existing training-free editors into a single inference step fails. This failure manifests as severe object distortion and a critical loss of consistency in non-edited regions, resulting from the high-energy, erratic trajectories produced by naive vector arithmetic on the models' structured fields. To address this problem, we introduce ChordEdit, a model agnostic, training-free, and inversion-free method that facilitates high-fidelity one-step editing. We recast editing as a transport problem between the source and target distributions defined by the source and target text prompts. Leveraging dynamic optimal transport theory, we derive a principled, low-energy control strategy. This strategy yields a smoothed, variance-reduced editing field that is inherently stable, facilitating the field to be traversed in a single, large integration step. A theoretically grounded and experimentally validated approach allows ChordEdit to deliver fast, lightweight and precise edits, finally achieving true real-time editing on these challenging models.

ChordEdit: One-Step Low-Energy Transport for Image Editing

TL;DR

ChordEdit is introduced, a model agnostic, training-free, and inversion-free method that facilitates high-fidelity one-step editing on T2I models and recast editing as a transport problem between the source and target distributions defined by the source and target text prompts.

Abstract

The advent of one-step text-to-image (T2I) models offers unprecedented synthesis speed. However, their application to text-guided image editing remains severely hampered, as forcing existing training-free editors into a single inference step fails. This failure manifests as severe object distortion and a critical loss of consistency in non-edited regions, resulting from the high-energy, erratic trajectories produced by naive vector arithmetic on the models' structured fields. To address this problem, we introduce ChordEdit, a model agnostic, training-free, and inversion-free method that facilitates high-fidelity one-step editing. We recast editing as a transport problem between the source and target distributions defined by the source and target text prompts. Leveraging dynamic optimal transport theory, we derive a principled, low-energy control strategy. This strategy yields a smoothed, variance-reduced editing field that is inherently stable, facilitating the field to be traversed in a single, large integration step. A theoretically grounded and experimentally validated approach allows ChordEdit to deliver fast, lightweight and precise edits, finally achieving true real-time editing on these challenging models.
Paper Structure (74 sections, 11 theorems, 136 equations, 24 figures, 4 tables, 2 algorithms)

This paper contains 74 sections, 11 theorems, 136 equations, 24 figures, 4 tables, 2 algorithms.

Key Result

Proposition D.1

Let the observable proxy field $\mathbf{R}(t)$ be in $L^2([0,1]; \mathbb{R}^d)$, and let the chord field $\hat{u}(t)$ be generated by convolution with any non-negative, unit-mass kernel $K_\delta$ as defined above. The total temporal kinetic energy of the chord field is strictly less than or equal t Furthermore, the inequality is strict if $K_\delta$ is not a Dirac delta function (i.e., it perform

Figures (24)

  • Figure 1: ChordEdit. These examples demonstrate our model agnostic, training-free and inversion-free method operating on fast generative models. ChordEdit mitigates the failures of naive single-step editing by deriving a stable, low-energy control field based on optimal transport theory. This field's stability permits a single, large integration step, facilitating precise edits that preserve non-edited regions. Results shown use SD-Turbo (top two rows) and SwiftBrush-v2 (bottom row). Labels indicate the desired semantic change.
  • Figure 2: Comparing ChordEdit (SD-Turbo) against one-step, few-step, and multi-step editing methods on PIE-bench ju2023direct, evaluating performance on background consistency (PSNR), semantic alignment (CLIP, referring to CLIP-Edited) radford2021learning, and Runtime. Our method facilitates real-time text-guided editing while yielding highly competitive results.
  • Figure 3: One-Step Simple drift editing fails. ChordEdit preserves structure. Simple drifts, a direct drift-difference from a one-step model, induce a high-energy, non-smooth vector field, yielding two disqualifying failures: (i) severe object distortion and (ii) background breakup and spurious structures. Zoomed crops (bottom) highlight the distortions in Simple drifts versus the faithful, photorealistic result of ChordEdit.
  • Figure 4: Comparison of editing field stability. (a) Multi-step Simple Drift: In conventional multi-step diffusion, the iterative application of the simple drift $\Delta v$ ensures a stable trajectory. (b) One-step Simple Drift: In distilled models, the naive field $\Delta v(x_t, t)$ is high-energy and volatile. A single, large integration step (solid arrow) accumulates significant error and deviates significantly, as the erratic underlying path (dashed) confirms. (c) Editing by ChordEdit (Ours): We derive a stable, low-energy Chord Control Field by time-averaging the observable fields $\mathbf{R}(x_\tau, t)$ and $\mathbf{R}(x_\tau, t-\delta)$. This smoothed field facilitates an accurate, single-step transport (red arrow) that faithfully reaches the target $x_{\rm tar}$.
  • Figure 5: 2D Toy Example of Distribution transport. Naive residual fields are high-energy and unstable under coarse discretization. ChordEdit computes a low-energy field (Eq. \ref{['eq:chord_main']}) that drives particles straight to the target with minimal deviation, facilitating reliable one-step transport.
  • ...and 19 more figures

Theorems & Definitions (26)

  • Proposition D.1: $L^2$-Energy Contraction
  • proof
  • Remark D.2: Contraction of Benamou–Brenier Energy
  • Corollary D.3: Pointwise Energy Bound
  • proof
  • Lemma D.4: Local truncation error of explicit Euler under a edit control field
  • proof
  • Proposition D.5: Consistency bound for the chord control field
  • proof
  • Theorem D.6: Global $O(h)$ convergence; chord has smaller constants
  • ...and 16 more