Training Matting Models without Alpha Labels

Wenze Liu; Zixuan Ye; Hao Lu; Zhiguo Cao; Xiangyu Yue

Training Matting Models without Alpha Labels

Wenze Liu, Zixuan Ye, Hao Lu, Zhiguo Cao, Xiangyu Yue

TL;DR

This work tackles the labeling bottleneck in deep image matting by training with only coarse trimap supervision rather than fine alpha mattes. It introduces a distance-based nonlocal prior, first as a DC loss and then as a Directional Distance Consistency Loss (DDC) that preserves the direction of alpha changes to align with image structure, enabling alpha propagation from known regions into transitions. The approach achieves performance on AM-2K and P3M-10K comparable to fine-label baselines, and in some cases even surpasses human ground truth, while requiring significantly less annotation effort. By combining semantic learning with well-crafted matting priors, the method offers a practical route toward high-quality matting without dense alpha labeling, with potential extensions to transparent objects and interactive matting.

Abstract

The labelling difficulty has been a longstanding problem in deep image matting. To escape from fine labels, this work explores using rough annotations such as trimaps coarsely indicating the foreground/background as supervision. We present that the cooperation between learned semantics from indicated known regions and proper assumed matting rules can help infer alpha values at transition areas. Inspired by the nonlocal principle in traditional image matting, we build a directional distance consistency loss (DDC loss) at each pixel neighborhood to constrain the alpha values conditioned on the input image. DDC loss forces the distance of similar pairs on the alpha matte and on its corresponding image to be consistent. In this way, the alpha values can be propagated from learned known regions to unknown transition areas. With only images and trimaps, a matting model can be trained under the supervision of a known loss and the proposed DDC loss. Experiments on AM-2K and P3M-10K dataset show that our paradigm achieves comparable performance with the fine-label-supervised baseline, while sometimes offers even more satisfying results than human-labelled ground truth. Code is available at \url{https://github.com/poppuppy/alpha-free-matting}.

Training Matting Models without Alpha Labels

TL;DR

Abstract

Paper Structure (17 sections, 11 equations, 12 figures, 8 tables)

This paper contains 17 sections, 11 equations, 12 figures, 8 tables.

Introduction
Related Work
Method
Exploration and Analysis
Distance Consistency Loss
Directional Distance Consistency Loss
Experiment
Implementation Details
Main Results
Ablation Study
Conclusion
Appendix
Impact of Trimap Roughness
Training with Segmentation Mask Labels
Effect of $d_{ij}^2$ in Affinity Loss
...and 2 more sections

Figures (12)

Figure 1: Our image matting training paradigm without alpha labels. (a) Compared with prior art adopting fine alpha matte annotations, we use only coarse trimaps as labels. During the training phase, we use an $l_1$ loss termed known loss to supervise the known regions indicated by the trimap, and devise a DDC loss to restrict the alpha values at transition areas. (b) Even without fine annotated labels, our trained model predicts accurate alpha mattes.
Figure 2: The outputs during training of four supervision policies. (a) The model can learn to extend the semantics from unknown to known with known loss. (b) The cooperation between known loss and affinity loss helps predict details, but fails to delineate long hair and causes hard segmentation. (c) The proposed DC loss well fits long hair and smooth transition on boundaries, but introduces texture noise on the foreground. (d) DDC loss eliminates the interior noise with no side effects.
Figure 3: Finding similar pixels in a local window. Centered at a pixel (blue) in the image, the top $K$ similar pixels (red) are selected in each $K\times K$ local window (pink) according to the euclidean distance.
Figure 4: The growth arrest problem of long hair supervised by affinity loss. Suppose there is a yellow tiger with a hair whose intensity is a constant in the image. We derive that under such conditions alpha values of approximate linear or quadratic variation produce small loss. Considering that the already generated coarse mask provides the initial starting and ending value as $1$ and $0$, the distribution of alpha values are stuck at a state shown in the bottom right plot.
Figure 5: A calculation instance of the proposed DC loss and DDC loss. On a homogeneous hair whose pixel value is $H$, $K$ similar pixels are selected in the $K\times K$ window centered at a certain pixel. With the selected indices, the corresponding alpha values are gathered. DC loss first calculates the euclidean distances between the center pixel and the selected similar pixels both in the image and in the alpha matte, and then forces the two distance to be equal. Based on DC loss, DDC loss eliminates the interior noise by preserving the sign of alpha distance.
...and 7 more figures

Training Matting Models without Alpha Labels

TL;DR

Abstract

Training Matting Models without Alpha Labels

Authors

TL;DR

Abstract

Table of Contents

Figures (12)