Table of Contents
Fetching ...

ByteEdit: Boost, Comply and Accelerate Generative Image Editing

Yuxi Ren, Jie Wu, Yanzuo Lu, Huafeng Kuang, Xin Xia, Xionghui Wang, Qianqian Wang, Yixing Zhu, Pan Xie, Shiyin Wang, Xuefeng Xiao, Yitong Wang, Min Zheng, Lean Fu

TL;DR

ByteEdit tackles diffusion-based image editing challenges by introducing a multi-reward feedback framework with three models: aesthetic, image-text alignment, and pixel-level coherence. It combines perceptual feedback learning, adversarial training, and progressive inference to boost quality, ensure instruction compliance, and accelerate sampling by reducing time steps from 20 to 8 in practice. Large-scale user studies show ByteEdit outperforms Adobe, Canva, and MeiTu on outpainting and inpainting tasks, with notable gains in quality and consistency (e.g., 388% quality, 135% consistency for ByteEdit-Outpainting vs baseline). This work demonstrates the viability of human feedback in editing and offers a scalable route to deploy faster, more faithful generative editors.

Abstract

Recent advancements in diffusion-based generative image editing have sparked a profound revolution, reshaping the landscape of image outpainting and inpainting tasks. Despite these strides, the field grapples with inherent challenges, including: i) inferior quality; ii) poor consistency; iii) insufficient instrcution adherence; iv) suboptimal generation efficiency. To address these obstacles, we present ByteEdit, an innovative feedback learning framework meticulously designed to Boost, Comply, and Accelerate Generative Image Editing tasks. ByteEdit seamlessly integrates image reward models dedicated to enhancing aesthetics and image-text alignment, while also introducing a dense, pixel-level reward model tailored to foster coherence in the output. Furthermore, we propose a pioneering adversarial and progressive feedback learning strategy to expedite the model's inference speed. Through extensive large-scale user evaluations, we demonstrate that ByteEdit surpasses leading generative image editing products, including Adobe, Canva, and MeiTu, in both generation quality and consistency. ByteEdit-Outpainting exhibits a remarkable enhancement of 388% and 135% in quality and consistency, respectively, when compared to the baseline model. Experiments also verfied that our acceleration models maintains excellent performance results in terms of quality and consistency.

ByteEdit: Boost, Comply and Accelerate Generative Image Editing

TL;DR

ByteEdit tackles diffusion-based image editing challenges by introducing a multi-reward feedback framework with three models: aesthetic, image-text alignment, and pixel-level coherence. It combines perceptual feedback learning, adversarial training, and progressive inference to boost quality, ensure instruction compliance, and accelerate sampling by reducing time steps from 20 to 8 in practice. Large-scale user studies show ByteEdit outperforms Adobe, Canva, and MeiTu on outpainting and inpainting tasks, with notable gains in quality and consistency (e.g., 388% quality, 135% consistency for ByteEdit-Outpainting vs baseline). This work demonstrates the viability of human feedback in editing and offers a scalable route to deploy faster, more faithful generative editors.

Abstract

Recent advancements in diffusion-based generative image editing have sparked a profound revolution, reshaping the landscape of image outpainting and inpainting tasks. Despite these strides, the field grapples with inherent challenges, including: i) inferior quality; ii) poor consistency; iii) insufficient instrcution adherence; iv) suboptimal generation efficiency. To address these obstacles, we present ByteEdit, an innovative feedback learning framework meticulously designed to Boost, Comply, and Accelerate Generative Image Editing tasks. ByteEdit seamlessly integrates image reward models dedicated to enhancing aesthetics and image-text alignment, while also introducing a dense, pixel-level reward model tailored to foster coherence in the output. Furthermore, we propose a pioneering adversarial and progressive feedback learning strategy to expedite the model's inference speed. Through extensive large-scale user evaluations, we demonstrate that ByteEdit surpasses leading generative image editing products, including Adobe, Canva, and MeiTu, in both generation quality and consistency. ByteEdit-Outpainting exhibits a remarkable enhancement of 388% and 135% in quality and consistency, respectively, when compared to the baseline model. Experiments also verfied that our acceleration models maintains excellent performance results in terms of quality and consistency.
Paper Structure (15 sections, 8 equations, 7 figures, 2 tables)

This paper contains 15 sections, 8 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: We introduce ByteEdit, a novel framework that utilizes feedback learning to enhance generative image editing tasks, resulting in outstanding generation performance, improved consistency, enhanced instruction adherence, and accelerated generation speed. To the best of our knowledge, ByteEdit emerges as the most superior and the fastest solution currently in the field of generative editing.
  • Figure 2: ByteEdit formulates a comprehensive feedback learning framework that facilitating aesthetics, image-text matching, consistency and inference speed.
  • Figure 3: Comparisons with state-of-the-art generative image editing systems in terms of human preference (i.e. GSB). More than 12,000 samples are collected for each task. For simplicity and to minimize the difficulty of collecting a large number of user opinions, we only offer the generated images by Adobe and our ByteEdit to the volunteers. "Good" indicates the generated images by our ByteEdit is preferred and vice versa.
  • Figure 4: Qualitative comparison in inpainting. We highlight key areas with red boxes.
  • Figure 5: Qualitative comparison in outpainting. We highlight key areas with red boxes.
  • ...and 2 more figures