Table of Contents
Fetching ...

DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation

Xiaobin Hu, Xu Peng, Donghao Luo, Xiaozhong Ji, Jinlong Peng, Zhengkai Jiang, Jiangning Zhang, Taisong Jin, Chengjie Wang, Rongrong Ji

TL;DR

DiffuMatting addresses the scarcity of matting-level annotations by training a diffusion model on a fixed green-screen canvas using Green100K, enabling “anything matting.” It introduces a green-background control loss and a detailed-transition-boundary loss, plus a latent-space matting head to produce precise matting annotations, all governed by a joint objective. The approach yields substantial improvements in matting accuracy (e.g., relative MSE reductions of 15.4% for General Object Matting and 11.4% for Portrait Matting) and demonstrates compatibility with community LoRAs and ControlNet for controllable design and image composition. By providing a scalable matting-data factory and robust generalization beyond Green100K, DiffuMatting offers a practical pathway to high-quality matting annotations across diverse objects and applications, while acknowledging limitations and addressing potential misuse with safeguards.

Abstract

Due to the difficulty and labor-consuming nature of getting highly accurate or matting annotations, there only exists a limited amount of highly accurate labels available to the public. To tackle this challenge, we propose a DiffuMatting which inherits the strong Everything generation ability of diffusion and endows the power of "matting anything". Our DiffuMatting can 1). act as an anything matting factory with high accurate annotations 2). be well-compatible with community LoRAs or various conditional control approaches to achieve the community-friendly art design and controllable generation. Specifically, inspired by green-screen-matting, we aim to teach the diffusion model to paint on a fixed green screen canvas. To this end, a large-scale greenscreen dataset (Green100K) is collected as a training dataset for DiffuMatting. Secondly, a green background control loss is proposed to keep the drawing board as a pure green color to distinguish the foreground and background. To ensure the synthesized object has more edge details, a detailed-enhancement of transition boundary loss is proposed as a guideline to generate objects with more complicated edge structures. Aiming to simultaneously generate the object and its matting annotation, we build a matting head to make a green color removal in the latent space of the VAE decoder. Our DiffuMatting shows several potential applications (e.g., matting-data generator, community-friendly art design and controllable generation). As a matting-data generator, DiffuMatting synthesizes general object and portrait matting sets, effectively reducing the relative MSE error by 15.4% in General Object Matting and 11.4% in Portrait Matting tasks. The dataset is released in our project page at \url{https://diffumatting.github.io}.

DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation

TL;DR

DiffuMatting addresses the scarcity of matting-level annotations by training a diffusion model on a fixed green-screen canvas using Green100K, enabling “anything matting.” It introduces a green-background control loss and a detailed-transition-boundary loss, plus a latent-space matting head to produce precise matting annotations, all governed by a joint objective. The approach yields substantial improvements in matting accuracy (e.g., relative MSE reductions of 15.4% for General Object Matting and 11.4% for Portrait Matting) and demonstrates compatibility with community LoRAs and ControlNet for controllable design and image composition. By providing a scalable matting-data factory and robust generalization beyond Green100K, DiffuMatting offers a practical pathway to high-quality matting annotations across diverse objects and applications, while acknowledging limitations and addressing potential misuse with safeguards.

Abstract

Due to the difficulty and labor-consuming nature of getting highly accurate or matting annotations, there only exists a limited amount of highly accurate labels available to the public. To tackle this challenge, we propose a DiffuMatting which inherits the strong Everything generation ability of diffusion and endows the power of "matting anything". Our DiffuMatting can 1). act as an anything matting factory with high accurate annotations 2). be well-compatible with community LoRAs or various conditional control approaches to achieve the community-friendly art design and controllable generation. Specifically, inspired by green-screen-matting, we aim to teach the diffusion model to paint on a fixed green screen canvas. To this end, a large-scale greenscreen dataset (Green100K) is collected as a training dataset for DiffuMatting. Secondly, a green background control loss is proposed to keep the drawing board as a pure green color to distinguish the foreground and background. To ensure the synthesized object has more edge details, a detailed-enhancement of transition boundary loss is proposed as a guideline to generate objects with more complicated edge structures. Aiming to simultaneously generate the object and its matting annotation, we build a matting head to make a green color removal in the latent space of the VAE decoder. Our DiffuMatting shows several potential applications (e.g., matting-data generator, community-friendly art design and controllable generation). As a matting-data generator, DiffuMatting synthesizes general object and portrait matting sets, effectively reducing the relative MSE error by 15.4% in General Object Matting and 11.4% in Portrait Matting tasks. The dataset is released in our project page at \url{https://diffumatting.github.io}.
Paper Structure (19 sections, 11 equations, 13 figures, 2 tables)

This paper contains 19 sections, 11 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Green-screen objects with matting-level annotations generation by DiffuMatting, including nets, grid and semitransparent tough objects and extended to almost any class (e.g., Transportation, Architecture, Toy).
  • Figure 2: Visual performance of our DiffuMatting on green-screen object generation in comparison with SOTA Midjourney and SD-XL models, and these models have difficulties in consistently generating objects on the pure green-screen.
  • Figure 3: An overview of Our DiffuMatting Network. Our DiffuMatting mainly consists of Green100k data collection and caption, green-screen detailed objects synthesis assisted by the green-background control loss $\mathcal{L}_{g}$ and the detailed-enhancement loss of transition boundary $\mathcal{L}_{detail}$, and matting-level annotation refinement via a matting-head in VAE latent space constrained by $\mathcal{L}_{latent}$ and GreenPost.
  • Figure 4: Visual performance of our DiffuMatting on green-screen-based object generation in comparison with LoRA and Dreambooth fine-tuning in our Green100K.
  • Figure 5: Matting-level annotation analysis. Our matting-level annotation (blue box) vs. pixel-level mask generated by DiffuMask wu2023diffumask (red box) in the class of dog and Owl (bird). Each class-object generation by DiffuMask requires fine-tuning AffineityNet for this specific class to get post-processing annotation results.
  • ...and 8 more figures