Table of Contents
Fetching ...

ParallelEdits: Efficient Multi-object Image Editing

Mingzhen Huang, Jialing Cai, Shan Jia, Vishnu Suresh Lokhande, Siwei Lyu

TL;DR

ParallelEdits tackles multi-aspect text-driven image editing by integrating edits across multiple attributes into diffusion steps through a fixed multi-branch architecture guided by an attention-aggregation mechanism. It introduces aspect grouping to partition edits into $N$ branches and performs inversion-free, branch-calibrated updates with cross-branch interactions to preserve content while editing multiple attributes simultaneously. The PIE-Bench++ dataset is proposed to benchmark multi-aspect editing, and experiments show superior editing accuracy and content preservation compared with state-of-the-art baselines, at a manageable computational cost. Overall, the work advances efficient, scalable multi-attribute editing in diffusion models and provides a robust benchmark for evaluating such methods, while noting remaining limitations and potential safeguards for deployment.

Abstract

Text-driven image synthesis has made significant advancements with the development of diffusion models, transforming how visual content is generated from text prompts. Despite these advances, text-driven image editing, a key area in computer graphics, faces unique challenges. A major challenge is making simultaneous edits across multiple objects or attributes. Applying these methods sequentially for multi-attribute edits increases computational demands and efficiency losses. In this paper, we address these challenges with significant contributions. Our main contribution is the development of ParallelEdits, a method that seamlessly manages simultaneous edits across multiple attributes. In contrast to previous approaches, ParallelEdits not only preserves the quality of single attribute edits but also significantly improves the performance of multitasking edits. This is achieved through innovative attention distribution mechanism and multi-branch design that operates across several processing heads. Additionally, we introduce the PIE-Bench++ dataset, an expansion of the original PIE-Bench dataset, to better support evaluating image-editing tasks involving multiple objects and attributes simultaneously. This dataset is a benchmark for evaluating text-driven image editing methods in multifaceted scenarios.

ParallelEdits: Efficient Multi-object Image Editing

TL;DR

ParallelEdits tackles multi-aspect text-driven image editing by integrating edits across multiple attributes into diffusion steps through a fixed multi-branch architecture guided by an attention-aggregation mechanism. It introduces aspect grouping to partition edits into branches and performs inversion-free, branch-calibrated updates with cross-branch interactions to preserve content while editing multiple attributes simultaneously. The PIE-Bench++ dataset is proposed to benchmark multi-aspect editing, and experiments show superior editing accuracy and content preservation compared with state-of-the-art baselines, at a manageable computational cost. Overall, the work advances efficient, scalable multi-attribute editing in diffusion models and provides a robust benchmark for evaluating such methods, while noting remaining limitations and potential safeguards for deployment.

Abstract

Text-driven image synthesis has made significant advancements with the development of diffusion models, transforming how visual content is generated from text prompts. Despite these advances, text-driven image editing, a key area in computer graphics, faces unique challenges. A major challenge is making simultaneous edits across multiple objects or attributes. Applying these methods sequentially for multi-attribute edits increases computational demands and efficiency losses. In this paper, we address these challenges with significant contributions. Our main contribution is the development of ParallelEdits, a method that seamlessly manages simultaneous edits across multiple attributes. In contrast to previous approaches, ParallelEdits not only preserves the quality of single attribute edits but also significantly improves the performance of multitasking edits. This is achieved through innovative attention distribution mechanism and multi-branch design that operates across several processing heads. Additionally, we introduce the PIE-Bench++ dataset, an expansion of the original PIE-Bench dataset, to better support evaluating image-editing tasks involving multiple objects and attributes simultaneously. This dataset is a benchmark for evaluating text-driven image editing methods in multifaceted scenarios.
Paper Structure (22 sections, 5 equations, 12 figures, 5 tables, 2 algorithms)

This paper contains 22 sections, 5 equations, 12 figures, 5 tables, 2 algorithms.

Figures (12)

  • Figure 1: Multi-aspect text-driven image editing. Multiple edits in images pose a significant challenge in existing models (such as DirectInverison ju2023direct and InfEdit xu2023inversion), as their performance downgrades with an increasing number of aspects. In contrast, our ParallelEdits can achieve precise multi-aspect image editing in 5 seconds. The symbol $\textcolor{blue}{\boldsymbol{\otimes}}$ denotes a swap action, the symbol $\textcolor{red}{\boldsymbol{\oplus}}$ denotes an object addition action, and the symbol $\textcolor{green}{\boldsymbol{\ominus}}$ denotes an object deletion. Arrows ($\rightarrow$) on the image highlight the aspects edited by our method.
  • Figure 2: Pipeline. Our method, ParallelEdits, takes a source image, source prompt, and target prompt as input and produces an edited image. The target prompt specifies the edits needed in the source image. Attention maps for all edited aspects are first collected. Aspect Grouping (see Section \ref{['sec:aspect_group']}) categorizes each aspect into one of $N$ groups (in the above figure, $N=5$). Each group is then assigned a branch and the branch-level updates are detailed in Section \ref{['sec:multi_branch']}. Each branch can be viewed either as a rigid editing branch, non-rigid editing branch, or global editing branch. Finally, adjustments to query/key/value at the self-attention and cross-attention layers are made, as illustrated in the figure and described in Section \ref{['sec:cross-branch']}.
  • Figure 3: Aspects and Aspect Grouping. In a text prompt, there are multiple independent tokens, with only some being editable, known as aspects and are underlined in the above example. These aspects can be added, deleted, or swapped between the source and target prompts. Pairs of source and target aspects are grouped into branches, and the methodology for aspect grouping is explained in Section \ref{['sec:aspect_group']}.
  • Figure 4: Qualitative results of ParallelEdits. We denote the edits in arrows with edit actions and aspects for each pair of images. The last image pair is a failure case of ParallelEdits.
  • Figure 5: Qualitative results comparison. Current methods fail to edit multiple aspects effectively, even using sequential edits (noted as *). Methods marked with $\star$$\star$ taking additional inputs other than source image and plain text.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Definition 4.1: Aspect