Table of Contents
Fetching ...

Low-Resolution Editing is All You Need for High-Resolution Editing

Junsung Lee, Hyunsoo Lee, Yong Jae Lee, Bohyung Han

TL;DR

This paper tackles high-resolution image editing by introducing ScaleEdit, a test-time optimization framework that edits high-resolution inputs patch-by-patch using a learnable feature-space transfer function to inject fine-scale details from the high-resolution source. A cross-patch synchronization mechanism, combining Blended Tweedie updates and a resampling strategy, ensures global coherence without requiring overlapping inference. By leveraging low-resolution editing priors through diffusion models, ScaleEdit achieves high-fidelity edits at 1K and 2K resolutions, outperforming diffusion-based SR baselines on metrics like MSE, SSIM, PSNR, and HaarPSI while preserving source textures. The approach generalizes across backbones (Stable Diffusion and FLUX) and demonstrates practical potential for high-fidelity, user-guided high-resolution content creation, albeit with reliance on base model quality and potential artifact risk from pre-trained priors.

Abstract

High-resolution content creation is rapidly emerging as a central challenge in both the vision and graphics communities. While images serve as the most fundamental modality for visual expression, content generation that aligns with the user intent requires effective, controllable high-resolution image manipulation mechanisms. However, existing approaches remain limited to low-resolution settings, typically supporting only up to 1K resolution. In this work, we introduce the task of high-resolution image editing and propose a test-time optimization framework to address it. Our method performs patch-wise optimization on high-resolution source images, followed by a fine-grained detail transfer module and a novel synchronization strategy to maintain consistency across patches. Extensive experiments show that our method produces high-quality edits, facilitating the way toward high-resolution content creation.

Low-Resolution Editing is All You Need for High-Resolution Editing

TL;DR

This paper tackles high-resolution image editing by introducing ScaleEdit, a test-time optimization framework that edits high-resolution inputs patch-by-patch using a learnable feature-space transfer function to inject fine-scale details from the high-resolution source. A cross-patch synchronization mechanism, combining Blended Tweedie updates and a resampling strategy, ensures global coherence without requiring overlapping inference. By leveraging low-resolution editing priors through diffusion models, ScaleEdit achieves high-fidelity edits at 1K and 2K resolutions, outperforming diffusion-based SR baselines on metrics like MSE, SSIM, PSNR, and HaarPSI while preserving source textures. The approach generalizes across backbones (Stable Diffusion and FLUX) and demonstrates practical potential for high-fidelity, user-guided high-resolution content creation, albeit with reliance on base model quality and potential artifact risk from pre-trained priors.

Abstract

High-resolution content creation is rapidly emerging as a central challenge in both the vision and graphics communities. While images serve as the most fundamental modality for visual expression, content generation that aligns with the user intent requires effective, controllable high-resolution image manipulation mechanisms. However, existing approaches remain limited to low-resolution settings, typically supporting only up to 1K resolution. In this work, we introduce the task of high-resolution image editing and propose a test-time optimization framework to address it. Our method performs patch-wise optimization on high-resolution source images, followed by a fine-grained detail transfer module and a novel synchronization strategy to maintain consistency across patches. Extensive experiments show that our method produces high-quality edits, facilitating the way toward high-resolution content creation.

Paper Structure

This paper contains 40 sections, 20 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Generated images using the proposed method, ScaleEdit. Our method successfully generates high-resolution edited images by leveraging low-resolution editing results as reference images.
  • Figure 2: Overview of the proposed method. (Left) We first optimize the transfer function $\phi_{\theta}(\mathbf{h}_i[t], t)$ to capture fine-grained details encoded in the high-resolution source trajectory $\{\mathbf{x}^{\mathrm{high}}_t[i] \}_{t=0}^{T}$. We then apply the optimized transfer function during the reverse process of $\{ \mathbf{\tilde{y}}_t [i]\}_{t=0}^{T}$, yielding a detail-enhance latent $\mathbf{\tilde{y}}_0[i] = \mathbf{y}_{0}^{\mathrm{high}}[i].$ (Right) We illustrate how the transfer function modulates the intermediate feature within the diffusion model.
  • Figure 3: Overview of the synchronization strategy. Starting from the detail enhancement process (Eq. \ref{['eq:ddim_reverse_detail_inject']}), we perform resampling (Eq. \ref{['eq:resample_inversion']}) and compute the blended Tweedie estimate. This estimate is then used to synchronize adjacent patches during the reverse process (Eq. \ref{['eq:resample_reverse']}).
  • Figure 4: [Best visualized when magnified.] Qualitative comparison of the ScaleEdit with diffusion-based super-resolution baselines cheng2025effectiveduan2025dit4srsun2024pisasrdong2025tsd. First three rows show the results of 1K-editing, while the last three rows visualize 2K-editing scenario. Here, we use the pretrained Stable Diffusion rombach2022high for ScaleEdit.
  • Figure 5: [Best visualized when magnified.] We visualize the effect of synchronization strategy in part (a), while we show the results of our method combined with the pretrained FLUX 1-dev flux2024 model in part (b).
  • ...and 3 more figures