FunEditor: Achieving Complex Image Edits via Function Aggregation with Diffusion Models
Mohammadreza Samadi, Fred X. Han, Mohammad Salameh, Hao Wu, Fengyu Sun, Chunhua Zhou, Di Niu
TL;DR
FunEditor introduces a diffusion-model editing framework that performs complex, localized image edits by aggregating simple, atomic editing functions. It learns trainable task tokens and employs cross-attention masking to apply multiple edits simultaneously to specified regions, enabling efficient four-step inference with no energy-guided optimization. The approach demonstrates superior object movement and pasting results, achieving higher image-quality metrics and substantially lower latency compared with both training-based and training-free baselines on COCOEE and ReS datasets. By leveraging function aggregation, FunEditor provides a data-efficient, scalable path to complex image editing that preserves region fidelity and object appearance during composition. The method is compatible with existing few-step diffusion backbones, offering practical impact for real-time or interactive editing workflows.
Abstract
Diffusion models have demonstrated outstanding performance in generative tasks, making them ideal candidates for image editing. Recent studies highlight their ability to apply desired edits effectively by following textual instructions, yet with two key challenges remaining. First, these models struggle to apply multiple edits simultaneously, resulting in computational inefficiencies due to their reliance on sequential processing. Second, relying on textual prompts to determine the editing region can lead to unintended alterations to the image. We introduce FunEditor, an efficient diffusion model designed to learn atomic editing functions and perform complex edits by aggregating simpler functions. This approach enables complex editing tasks, such as object movement, by aggregating multiple functions and applying them simultaneously to specific areas. Our experiments demonstrate that FunEditor significantly outperforms recent inference-time optimization methods and fine-tuned models, either quantitatively across various metrics or through visual comparisons or both, on complex tasks like object movement and object pasting. In the meantime, with only 4 steps of inference, FunEditor achieves 5-24x inference speedups over existing popular methods. The code is available at: mhmdsmdi.github.io/funeditor/.
