Table of Contents
Fetching ...

Efficient Portrait Matte Creation With Layer Diffusion and Connectivity Priors

Zhiyuan Lu, Hao Lu, Hua Huang

TL;DR

This work shows that one can leverage text prompts and the recent Layer Diffusion model to generate high-quality portrait foregrounds and extract latent portrait mattes, and creates a large-scale portrait matting dataset, termed LD-Portrait-20K, with $20,051$ portrait foregrounds and high-quality alpha mattes.

Abstract

Learning effective deep portrait matting models requires training data of both high quality and large quantity. Neither quality nor quantity can be easily met for portrait matting, however. Since the most accurate ground-truth portrait mattes are acquired in front of the green screen, it is almost impossible to harvest a large-scale portrait matting dataset in reality. This work shows that one can leverage text prompts and the recent Layer Diffusion model to generate high-quality portrait foregrounds and extract latent portrait mattes. However, the portrait mattes cannot be readily in use due to significant generation artifacts. Inspired by the connectivity priors observed in portrait images, that is, the border of portrait foregrounds always appears connected, a connectivity-aware approach is introduced to refine portrait mattes. Building on this, a large-scale portrait matting dataset is created, termed LD-Portrait-20K, with $20,051$ portrait foregrounds and high-quality alpha mattes. Extensive experiments demonstrated the value of the LD-Portrait-20K dataset, with models trained on it significantly outperforming those trained on other datasets. In addition, comparisons with the chroma keying algorithm and an ablation study on dataset capacity further confirmed the effectiveness of the proposed matte creation approach. Further, the dataset also contributes to state-of-the-art video portrait matting, implemented by simple video segmentation and a trimap-based image matting model trained on this dataset.

Efficient Portrait Matte Creation With Layer Diffusion and Connectivity Priors

TL;DR

This work shows that one can leverage text prompts and the recent Layer Diffusion model to generate high-quality portrait foregrounds and extract latent portrait mattes, and creates a large-scale portrait matting dataset, termed LD-Portrait-20K, with portrait foregrounds and high-quality alpha mattes.

Abstract

Learning effective deep portrait matting models requires training data of both high quality and large quantity. Neither quality nor quantity can be easily met for portrait matting, however. Since the most accurate ground-truth portrait mattes are acquired in front of the green screen, it is almost impossible to harvest a large-scale portrait matting dataset in reality. This work shows that one can leverage text prompts and the recent Layer Diffusion model to generate high-quality portrait foregrounds and extract latent portrait mattes. However, the portrait mattes cannot be readily in use due to significant generation artifacts. Inspired by the connectivity priors observed in portrait images, that is, the border of portrait foregrounds always appears connected, a connectivity-aware approach is introduced to refine portrait mattes. Building on this, a large-scale portrait matting dataset is created, termed LD-Portrait-20K, with portrait foregrounds and high-quality alpha mattes. Extensive experiments demonstrated the value of the LD-Portrait-20K dataset, with models trained on it significantly outperforming those trained on other datasets. In addition, comparisons with the chroma keying algorithm and an ablation study on dataset capacity further confirmed the effectiveness of the proposed matte creation approach. Further, the dataset also contributes to state-of-the-art video portrait matting, implemented by simple video segmentation and a trimap-based image matting model trained on this dataset.

Paper Structure

This paper contains 17 sections, 2 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Overview of efficient portrait matte creation. A represents the process of Connectivity-Aware Alpha Refinement, where the background regions of $F$ (where the alpha value is strictly zero) are padded by Layer Diffusion using an iterative Gaussian filter to avoid aliasing and unnecessary edge patterns. $\bar{\alpha}$ represents regions in the alpha matte where the pixel values are neither 0 nor 255, while the red areas in $\bar{\alpha}_h$ indicate pixels with erroneous alpha values. B illustrates the dataset creation process, where a diverse dataset is generated through a wide range of carefully designed prompts.
  • Figure 2: The framework of the proposed method. The first row of images in Step 2 shows the portrait images generated by Layer Diffusion, while the second row displays their corresponding inverse alpha. In Step 3, the first column shows the RGB images, the second column displays the initial inverse alpha, the third column shows the optimized inverse alpha, and the fourth column presents the final alpha matte.
  • Figure 3: These are examples to be deleted. The first row shows the generated RGB images, the second row displays their corresponding alpha mattes, and the third row illustrates the inverse alpha values.
  • Figure 4: Display of Connectivity-Aware Alpha Refinement results. The first row shows the RGB portrait images, the second row displays the inverse alpha corresponding to the original alpha matte, and the third and fourth rows show the optimized inverse alpha and alpha matte, respectively.
  • Figure 5: Examples from the LD-Portrait-20K Dataset. 100 examples from the LD-Portrait-20K dataset are presented, including RGB portrait images composited onto different backgrounds and their corresponding alpha mattes, demonstrating the diversity and representativeness of the dataset.
  • ...and 3 more figures