Table of Contents
Fetching ...

BlobCtrl: Taming Controllable Blob for Element-level Image Editing

Yaowei Li, Lingen Li, Zhaoyang Zhang, Xiaoyu Li, Guangzhi Wang, Hongxiang Li, Xiaodong Cun, Ying Shan, Yuexian Zou

TL;DR

This work presents BlobCtrl, a framework for element-level image editing based on a probabilistic blob-based representation that disentangles layout from appearance, affording fine-grained, controllable object-level elements manipulation.

Abstract

As user expectations for image editing continue to rise, the demand for flexible, fine-grained manipulation of specific visual elements presents a challenge for current diffusion-based methods. In this work, we present BlobCtrl, a framework for element-level image editing based on a probabilistic blob-based representation. Treating blobs as visual primitives, BlobCtrl disentangles layout from appearance, affording fine-grained, controllable object-level manipulation. Our key contributions are twofold: (1) an in-context dual-branch diffusion model that separates foreground and background processing, incorporating blob representations to explicitly decouple layout and appearance, and (2) a self-supervised disentangle-then-reconstruct training paradigm with an identity-preserving loss function, along with tailored strategies to efficiently leverage blob-image pairs. To foster further research, we introduce BlobData for large-scale training and BlobBench, a benchmark for systematic evaluation. Experimental results demonstrate that BlobCtrl achieves state-of-the-art performance in a variety of element-level editing tasks, such as object addition, removal, scaling, and replacement, while maintaining computational efficiency. Project Webpage: https://liyaowei-stu.github.io/project/BlobCtrl/

BlobCtrl: Taming Controllable Blob for Element-level Image Editing

TL;DR

This work presents BlobCtrl, a framework for element-level image editing based on a probabilistic blob-based representation that disentangles layout from appearance, affording fine-grained, controllable object-level elements manipulation.

Abstract

As user expectations for image editing continue to rise, the demand for flexible, fine-grained manipulation of specific visual elements presents a challenge for current diffusion-based methods. In this work, we present BlobCtrl, a framework for element-level image editing based on a probabilistic blob-based representation. Treating blobs as visual primitives, BlobCtrl disentangles layout from appearance, affording fine-grained, controllable object-level manipulation. Our key contributions are twofold: (1) an in-context dual-branch diffusion model that separates foreground and background processing, incorporating blob representations to explicitly decouple layout and appearance, and (2) a self-supervised disentangle-then-reconstruct training paradigm with an identity-preserving loss function, along with tailored strategies to efficiently leverage blob-image pairs. To foster further research, we introduce BlobData for large-scale training and BlobBench, a benchmark for systematic evaluation. Experimental results demonstrate that BlobCtrl achieves state-of-the-art performance in a variety of element-level editing tasks, such as object addition, removal, scaling, and replacement, while maintaining computational efficiency. Project Webpage: https://liyaowei-stu.github.io/project/BlobCtrl/

Paper Structure

This paper contains 42 sections, 15 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Blob Formula. A blob can be represented in two equivalent forms: geometrically as an ellipse and statistically as a 2D Gaussian distribution. The two forms are exactly equivalent and interchangeable.
  • Figure 2: Overview of BlobCtrl. Our framework employs a dual-branch architecture: a foreground branch for identity encoding and a background branch for scene preservation and fusion. Inputs are concatenated in an in-context manner (Sec. \ref{['sec:model_architecture']}), and the model is trained using the proposed strategy (Sec. \ref{['sec:self-supervised']}).
  • Figure 3: Element-level editing comparison across methods. (a) General Methods supporting diverse element-level operations; (b) Translation-only Methods limited to point-based object relocation. Please zoom in to view source images and manipulation instructions in detail.
  • Figure 4: Foreground–Background Fusion Ablation. Effect of fusion step ratio $t_\tau$, fusion weight $\omega$, and foreground inputs $\bm{z}_1$, $\bm{F}_1$ on identity preservation and semantic alignment, showing flexible control and diverse outputs.
  • Figure 5: Ablation of Identity Preservation Loss. Results of full-image denoising loss and foreground branch outputs.
  • ...and 7 more figures