Image Editing with Diffusion Models: A Survey
Jia Wang, Jie Hu, Xiaoqi Ma, Hanghang Ma, Xiaoming Wei, Enhua Wu
TL;DR
This survey systematically analyzes image editing with diffusion models by defining editing tasks, classifying editing methods into inversion-, fine-tuning-, and adapter-based approaches, and detailing evaluation metrics, benchmarks, and dataset construction. It highlights how inversion-based methods balance information preservation and introduction, how fine-tuning strategies enable generalizable or task-specific edits, and how adapters provide flexible, parameter-efficient control. The paper consolidates evaluation pipelines using both objective metrics and ML-based scoring, and catalogs extraction- and generation-based datasets, offering a practical roadmap for researchers to select methods, datasets, and evaluation protocols. It also discusses current challenges and proposes directions toward multimodal and multi-turn editing, aiming to advance robust, scalable diffusion-based image editing in real-world applications.
Abstract
With deeper exploration of diffusion model, developments in the field of image generation have triggered a boom in image creation. As the quality of base-model generated images continues to improve, so does the demand for further application like image editing. In recent years, many remarkable works are realizing a wide variety of editing effects. However, the wide variety of editing types and diverse editing approaches have made it difficult for researchers to establish a comprehensive view of the development of this field. In this survey, we summarize the image editing field from four aspects: tasks definition, methods classification, results evaluation and editing datasets. First, we provide a definition of image editing, which in turn leads to a variety of editing task forms from the perspective of operation parts and manipulation actions. Subsequently, we categorize and summary methods for implementing editing into three categories: inversion-based, fine-tuning-based and adapter-based. In addition, we organize the currently used metrics, available datasets and corresponding construction methods. At the end, we present some visions for the future development of the image editing field based on the previous summaries.
