Low-Resolution Editing is All You Need for High-Resolution Editing
Junsung Lee, Hyunsoo Lee, Yong Jae Lee, Bohyung Han
TL;DR
This paper tackles high-resolution image editing by introducing ScaleEdit, a test-time optimization framework that edits high-resolution inputs patch-by-patch using a learnable feature-space transfer function to inject fine-scale details from the high-resolution source. A cross-patch synchronization mechanism, combining Blended Tweedie updates and a resampling strategy, ensures global coherence without requiring overlapping inference. By leveraging low-resolution editing priors through diffusion models, ScaleEdit achieves high-fidelity edits at 1K and 2K resolutions, outperforming diffusion-based SR baselines on metrics like MSE, SSIM, PSNR, and HaarPSI while preserving source textures. The approach generalizes across backbones (Stable Diffusion and FLUX) and demonstrates practical potential for high-fidelity, user-guided high-resolution content creation, albeit with reliance on base model quality and potential artifact risk from pre-trained priors.
Abstract
High-resolution content creation is rapidly emerging as a central challenge in both the vision and graphics communities. While images serve as the most fundamental modality for visual expression, content generation that aligns with the user intent requires effective, controllable high-resolution image manipulation mechanisms. However, existing approaches remain limited to low-resolution settings, typically supporting only up to 1K resolution. In this work, we introduce the task of high-resolution image editing and propose a test-time optimization framework to address it. Our method performs patch-wise optimization on high-resolution source images, followed by a fine-grained detail transfer module and a novel synchronization strategy to maintain consistency across patches. Extensive experiments show that our method produces high-quality edits, facilitating the way toward high-resolution content creation.
