Edit Everything: A Text-Guided Generative System for Images Editing
Defeng Xie, Ruichen Wang, Jian Ma, Chen Chen, Haonan Lu, Dong Yang, Fobo Shi, Xiaodong Lin
TL;DR
The paper tackles text-guided image editing by integrating segmentation-aware editing with diffusion generation. It introduces Edit Everything, a pipeline that combines Segment Anything for segmentation, CLIP for segment ranking, and Stable Diffusion for replacement synthesis, with targeted Chinese-language pretraining to enable native prompts. The approach supports simple edits and complex, iterative prompt processes, delivering high-fidelity results and outperforming open-source baselines on Chinese-language data. Limitations include reliance on unmodified architectures and non-public crawled data, but the work demonstrates practical, language-aware image editing with precise control and broad potential applications.
Abstract
We introduce a new generative system called Edit Everything, which can take image and text inputs and produce image outputs. Edit Everything allows users to edit images using simple text instructions. Our system designs prompts to guide the visual module in generating requested images. Experiments demonstrate that Edit Everything facilitates the implementation of the visual aspects of Stable Diffusion with the use of Segment Anything model and CLIP. Our system is publicly available at https://github.com/DefengXie/Edit_Everything.
