Edit Everything: A Text-Guided Generative System for Images Editing

Defeng Xie; Ruichen Wang; Jian Ma; Chen Chen; Haonan Lu; Dong Yang; Fobo Shi; Xiaodong Lin

Edit Everything: A Text-Guided Generative System for Images Editing

Defeng Xie, Ruichen Wang, Jian Ma, Chen Chen, Haonan Lu, Dong Yang, Fobo Shi, Xiaodong Lin

TL;DR

The paper tackles text-guided image editing by integrating segmentation-aware editing with diffusion generation. It introduces Edit Everything, a pipeline that combines Segment Anything for segmentation, CLIP for segment ranking, and Stable Diffusion for replacement synthesis, with targeted Chinese-language pretraining to enable native prompts. The approach supports simple edits and complex, iterative prompt processes, delivering high-fidelity results and outperforming open-source baselines on Chinese-language data. Limitations include reliance on unmodified architectures and non-public crawled data, but the work demonstrates practical, language-aware image editing with precise control and broad potential applications.

Abstract

We introduce a new generative system called Edit Everything, which can take image and text inputs and produce image outputs. Edit Everything allows users to edit images using simple text instructions. Our system designs prompts to guide the visual module in generating requested images. Experiments demonstrate that Edit Everything facilitates the implementation of the visual aspects of Stable Diffusion with the use of Segment Anything model and CLIP. Our system is publicly available at https://github.com/DefengXie/Edit_Everything.

Edit Everything: A Text-Guided Generative System for Images Editing

TL;DR

Abstract

Paper Structure (12 sections, 4 figures, 1 table)

This paper contains 12 sections, 4 figures, 1 table.

Introduction
Methods
Architecture
Pre-training Data
Implementation
Main Results
Simple Prompts
Complicated Prompts
Further Comparisons
Limitations
Conclusion
Acknowledgements

Figures (4)

Figure 1: The network architecture of Edit Everything. The original image is separated into several segments with the help of Segment Anything model (SAM). Next, These segments are ranked based on the source prompt, and the target segment is chosen based on the highest score calculated by our trained CLIP model. The source prompt is a text that describes the target object and editing styles. Finally, guided by the target prompt, Stable Diffusion (SD) generates the replacement object for the mask segment. This process is seamless and efficient, resulting in high-quality image editing.
Figure 2: Text-guided image editing examples created by Edit Everything. Our advanced system detects the dark region, and erases them by the source target. And then we apply SD to fill it based on the target prompt. Our system is able to produce various styles and seamlessly match the surrounding context.
Figure 3: Iteratively replacing objects of an image step by step using Editing Everything.
Figure 4: Comparisons of images generated by open-source models and our trained models. Our models could support Chinese inputs.

Edit Everything: A Text-Guided Generative System for Images Editing

TL;DR

Abstract

Edit Everything: A Text-Guided Generative System for Images Editing

Authors

TL;DR

Abstract

Table of Contents

Figures (4)