Table of Contents
Fetching ...

Towards Label-Efficient Human Matting: A Simple Baseline for Weakly Semi-Supervised Trimap-Free Human Matting

Beomyoung Kim, Myeong Yeon Yi, Joonsang Yu, Young Joon Yoo, Sung Ju Hwang

TL;DR

This work tackles the high annotation cost of matte labels in human matting and the domain generalization gap of trimap-free models trained on synthetic data. It introduces Weakly Semi-Supervised Human Matting (WSSHM) and a simple yet effective Matte Label Blending (MLB) strategy within a two-stage teacher-student framework: a teacher learns boundary fidelity from synthetic matte data, and a student learns robust matte prediction from natural segmentation data guided by MLB, which blends teacher boundaries with coarse segmentation. The approach yields strong improvements in real-world robustness and boundary detail with modest matte data, while enabling real-time performance on lightweight backbones and transferability across multiple matting architectures. Overall, MLB demonstrates a practical path to label-efficient matting with strong domain generalization and boundary quality, pointing to future work leveraging unlabeled data through pseudo-labeling.

Abstract

This paper presents a new practical training method for human matting, which demands delicate pixel-level human region identification and significantly laborious annotations. To reduce the annotation cost, most existing matting approaches often rely on image synthesis to augment the dataset. However, the unnaturalness of synthesized training images brings in a new domain generalization challenge for natural images. To address this challenge, we introduce a new learning paradigm, weakly semi-supervised human matting (WSSHM), which leverages a small amount of expensive matte labels and a large amount of budget-friendly segmentation labels, to save the annotation cost and resolve the domain generalization problem. To achieve the goal of WSSHM, we propose a simple and effective training method, named Matte Label Blending (MLB), that selectively guides only the beneficial knowledge of the segmentation and matte data to the matting model. Extensive experiments with our detailed analysis demonstrate our method can substantially improve the robustness of the matting model using a few matte data and numerous segmentation data. Our training method is also easily applicable to real-time models, achieving competitive accuracy with breakneck inference speed (328 FPS on NVIDIA V100 GPU). The implementation code is available at \url{https://github.com/clovaai/WSSHM}.

Towards Label-Efficient Human Matting: A Simple Baseline for Weakly Semi-Supervised Trimap-Free Human Matting

TL;DR

This work tackles the high annotation cost of matte labels in human matting and the domain generalization gap of trimap-free models trained on synthetic data. It introduces Weakly Semi-Supervised Human Matting (WSSHM) and a simple yet effective Matte Label Blending (MLB) strategy within a two-stage teacher-student framework: a teacher learns boundary fidelity from synthetic matte data, and a student learns robust matte prediction from natural segmentation data guided by MLB, which blends teacher boundaries with coarse segmentation. The approach yields strong improvements in real-world robustness and boundary detail with modest matte data, while enabling real-time performance on lightweight backbones and transferability across multiple matting architectures. Overall, MLB demonstrates a practical path to label-efficient matting with strong domain generalization and boundary quality, pointing to future work leveraging unlabeled data through pseudo-labeling.

Abstract

This paper presents a new practical training method for human matting, which demands delicate pixel-level human region identification and significantly laborious annotations. To reduce the annotation cost, most existing matting approaches often rely on image synthesis to augment the dataset. However, the unnaturalness of synthesized training images brings in a new domain generalization challenge for natural images. To address this challenge, we introduce a new learning paradigm, weakly semi-supervised human matting (WSSHM), which leverages a small amount of expensive matte labels and a large amount of budget-friendly segmentation labels, to save the annotation cost and resolve the domain generalization problem. To achieve the goal of WSSHM, we propose a simple and effective training method, named Matte Label Blending (MLB), that selectively guides only the beneficial knowledge of the segmentation and matte data to the matting model. Extensive experiments with our detailed analysis demonstrate our method can substantially improve the robustness of the matting model using a few matte data and numerous segmentation data. Our training method is also easily applicable to real-time models, achieving competitive accuracy with breakneck inference speed (328 FPS on NVIDIA V100 GPU). The implementation code is available at \url{https://github.com/clovaai/WSSHM}.
Paper Structure (32 sections, 3 equations, 13 figures, 7 tables)

This paper contains 32 sections, 3 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: (a) The forms of segmentation label, matte label, and trimap. The matte label requires much more sophisticated annotations than the segmentation label. The gray-colored region in the trimap is the unknown region where details need to be estimated. (b) An example of synthetic matte data that is unnatural and lacks real-world context.
  • Figure 2: The qualitative comparisons of the models trained with (b) only segmentation data, (c) only matte data, (d) and both segmentation and matte data with our training method. The first and second rows are the results on natural (real-world) images, and the third and fourth rows are the results on synthetic images.
  • Figure 3: Illustration for the definition of Weakly Semi-Supervised Human Matting (WSSHM). The goal of the WSSHM is to train the trimap-free matting model using a small amount of expensive matte data and a large amount of economic segmentation data.
  • Figure 4: Overview of the proposed training pipeline encompassing the two-step process involving (1st step) teacher network training with synthetic matte data and (2nd step) student network training with natural segmentation data, along with the integration of Matte Label Blending (MLB) mechanism to combine boundary detail representation and domain generalization to achieve the goal of WSSHM.
  • Figure 5: Qualitative samples of Matte Label Blending demonstrating its ability to generate high-quality blended matte labels by effectively combining boundary detail representation from teacher network outputs with the coarse-level knowledge from ground-truth segmentation labels.
  • ...and 8 more figures