Table of Contents
Fetching ...

Image Editing with Diffusion Models: A Survey

Jia Wang, Jie Hu, Xiaoqi Ma, Hanghang Ma, Xiaoming Wei, Enhua Wu

TL;DR

This survey systematically analyzes image editing with diffusion models by defining editing tasks, classifying editing methods into inversion-, fine-tuning-, and adapter-based approaches, and detailing evaluation metrics, benchmarks, and dataset construction. It highlights how inversion-based methods balance information preservation and introduction, how fine-tuning strategies enable generalizable or task-specific edits, and how adapters provide flexible, parameter-efficient control. The paper consolidates evaluation pipelines using both objective metrics and ML-based scoring, and catalogs extraction- and generation-based datasets, offering a practical roadmap for researchers to select methods, datasets, and evaluation protocols. It also discusses current challenges and proposes directions toward multimodal and multi-turn editing, aiming to advance robust, scalable diffusion-based image editing in real-world applications.

Abstract

With deeper exploration of diffusion model, developments in the field of image generation have triggered a boom in image creation. As the quality of base-model generated images continues to improve, so does the demand for further application like image editing. In recent years, many remarkable works are realizing a wide variety of editing effects. However, the wide variety of editing types and diverse editing approaches have made it difficult for researchers to establish a comprehensive view of the development of this field. In this survey, we summarize the image editing field from four aspects: tasks definition, methods classification, results evaluation and editing datasets. First, we provide a definition of image editing, which in turn leads to a variety of editing task forms from the perspective of operation parts and manipulation actions. Subsequently, we categorize and summary methods for implementing editing into three categories: inversion-based, fine-tuning-based and adapter-based. In addition, we organize the currently used metrics, available datasets and corresponding construction methods. At the end, we present some visions for the future development of the image editing field based on the previous summaries.

Image Editing with Diffusion Models: A Survey

TL;DR

This survey systematically analyzes image editing with diffusion models by defining editing tasks, classifying editing methods into inversion-, fine-tuning-, and adapter-based approaches, and detailing evaluation metrics, benchmarks, and dataset construction. It highlights how inversion-based methods balance information preservation and introduction, how fine-tuning strategies enable generalizable or task-specific edits, and how adapters provide flexible, parameter-efficient control. The paper consolidates evaluation pipelines using both objective metrics and ML-based scoring, and catalogs extraction- and generation-based datasets, offering a practical roadmap for researchers to select methods, datasets, and evaluation protocols. It also discusses current challenges and proposes directions toward multimodal and multi-turn editing, aiming to advance robust, scalable diffusion-based image editing in real-world applications.

Abstract

With deeper exploration of diffusion model, developments in the field of image generation have triggered a boom in image creation. As the quality of base-model generated images continues to improve, so does the demand for further application like image editing. In recent years, many remarkable works are realizing a wide variety of editing effects. However, the wide variety of editing types and diverse editing approaches have made it difficult for researchers to establish a comprehensive view of the development of this field. In this survey, we summarize the image editing field from four aspects: tasks definition, methods classification, results evaluation and editing datasets. First, we provide a definition of image editing, which in turn leads to a variety of editing task forms from the perspective of operation parts and manipulation actions. Subsequently, we categorize and summary methods for implementing editing into three categories: inversion-based, fine-tuning-based and adapter-based. In addition, we organize the currently used metrics, available datasets and corresponding construction methods. At the end, we present some visions for the future development of the image editing field based on the previous summaries.

Paper Structure

This paper contains 31 sections, 2 equations, 13 figures.

Figures (13)

  • Figure 1: An overview of our survey, which includes four main parts: editing tasks, methods classification, results evaluation and editing datasets.
  • Figure 2: Partition of images and corresponding editing tasks. An image can be divided into two primary components: visual content and visual expression, each of which can be further segmented into more detailed concepts. Each editing task can be regarded as the manipulation of these underlying concepts.
  • Figure 3: Examples of feature images and natural images.
  • Figure 4: Different instruction types. Text instructions are suitable for general editing scenarios. Feature image instructions enable more fine-grained editing. In-context learning can convey different editing modes.
  • Figure 5: Examples of different editing tasks. From the perspective of simplicity, we standardized the source images for most editing examples.
  • ...and 8 more figures