Table of Contents
Fetching ...

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

Jing Gu, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Yilin Wang, Xin Eric Wang

TL;DR

This work introduces SwapAnything, a novel framework that can swap any objects in an image with personalized concepts given by the reference, while keeping the context unchanged, and proposes targeted variable swapping to apply region control over latent feature maps and swap masked variables for faithful context preservation and initial semantic concept swapping.

Abstract

Effective editing of personal content holds a pivotal role in enabling individuals to express their creativity, weaving captivating narratives within their visual stories, and elevate the overall quality and impact of their visual content. Therefore, in this work, we introduce SwapAnything, a novel framework that can swap any objects in an image with personalized concepts given by the reference, while keeping the context unchanged. Compared with existing methods for personalized subject swapping, SwapAnything has three unique advantages: (1) precise control of arbitrary objects and parts rather than the main subject, (2) more faithful preservation of context pixels, (3) better adaptation of the personalized concept to the image. First, we propose targeted variable swapping to apply region control over latent feature maps and swap masked variables for faithful context preservation and initial semantic concept swapping. Then, we introduce appearance adaptation, to seamlessly adapt the semantic concept into the original image in terms of target location, shape, style, and content during the image generation process. Extensive results on both human and automatic evaluation demonstrate significant improvements of our approach over baseline methods on personalized swapping. Furthermore, SwapAnything shows its precise and faithful swapping abilities across single object, multiple objects, partial object, and cross-domain swapping tasks. SwapAnything also achieves great performance on text-based swapping and tasks beyond swapping such as object insertion.

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

TL;DR

This work introduces SwapAnything, a novel framework that can swap any objects in an image with personalized concepts given by the reference, while keeping the context unchanged, and proposes targeted variable swapping to apply region control over latent feature maps and swap masked variables for faithful context preservation and initial semantic concept swapping.

Abstract

Effective editing of personal content holds a pivotal role in enabling individuals to express their creativity, weaving captivating narratives within their visual stories, and elevate the overall quality and impact of their visual content. Therefore, in this work, we introduce SwapAnything, a novel framework that can swap any objects in an image with personalized concepts given by the reference, while keeping the context unchanged. Compared with existing methods for personalized subject swapping, SwapAnything has three unique advantages: (1) precise control of arbitrary objects and parts rather than the main subject, (2) more faithful preservation of context pixels, (3) better adaptation of the personalized concept to the image. First, we propose targeted variable swapping to apply region control over latent feature maps and swap masked variables for faithful context preservation and initial semantic concept swapping. Then, we introduce appearance adaptation, to seamlessly adapt the semantic concept into the original image in terms of target location, shape, style, and content during the image generation process. Extensive results on both human and automatic evaluation demonstrate significant improvements of our approach over baseline methods on personalized swapping. Furthermore, SwapAnything shows its precise and faithful swapping abilities across single object, multiple objects, partial object, and cross-domain swapping tasks. SwapAnything also achieves great performance on text-based swapping and tasks beyond swapping such as object insertion.
Paper Structure (30 sections, 11 equations, 14 figures, 3 tables)

This paper contains 30 sections, 11 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: SwapAnything results on various personalized image swapping tasks.SwapAnything is adept at precise, arbitrary object replacement in a source image with a personalized reference, and achieves high-fidelity swapping results without influencing any context pixels. We demonstrate its general swapping effect in single-object, multi-object, partial-object, and cross-domain swapping tasks.
  • Figure 2: Overview of SwapAnything on swapping a object from a source image ($I_{src}$) into a personalized concept ($<{*}>$) to get the target image ($I_{target}$). The personalized concept is first converted into textual space to be treated as concept appearance. Meanwhile, the source image is first inverted into initial noise to obtain U-Net variables (including latent feature, attention map, and attention output). Targeted variable swapping preserves the context pixels in the source image. The appearance adaptation process then utilizes these informative variables to integrate the concept into the target image.
  • Figure 3: Swapping process in SwapAnything. The left part shows the correspondence between latent feature $z$ and the Generated image. The right part shows the procedure of targeted variable manipulation in the U-Net diffusion process.
  • Figure 4: Qualitative comparison with different baselines. Note that those baseline methods were already equipped with some components of SwapAnything for precise control of the swapping region. Please check \ref{['subsec:implementation-details']} for details.
  • Figure 5: Multi-object swapping results of SwapAnything. Our method could easily swap multiple objects via swapping one object at a time. Note that the red circle means the target object to be replaced. The same color means a pair of concept and target for object swapping.
  • ...and 9 more figures