Table of Contents
Fetching ...

Learning Trimaps via Clicks for Image Matting

Chenyi Zhang, Yihan Hu, Henghui Ding, Humphrey Shi, Yao Zhao, Yunchao Wei

TL;DR

The paper addresses the bottleneck of manual trimap annotation in image matting by proposing Click2Trimap, an interactive model that converts sparse user clicks into high-quality trimaps and alpha mattes. It introduces an Integrative Three-class Training Strategy (ITTS) and a Conditioned Unknown Prioritized Simulation (CUPS) to simulate realistic user interactions and emphasize unknown-region recall. When integrated with trimap-based matting models, Click2Trimap achieves comparable alpha matte accuracy to manual trimaps while dramatically reducing interaction cost, including an average of ~5 seconds per image in user studies. The approach also demonstrates strong scalability to video matting and robustness across diverse object types and real-world data.

Abstract

Despite significant advancements in image matting, existing models heavily depend on manually-drawn trimaps for accurate results in natural image scenarios. However, the process of obtaining trimaps is time-consuming, lacking user-friendliness and device compatibility. This reliance greatly limits the practical application of all trimap-based matting methods. To address this issue, we introduce Click2Trimap, an interactive model capable of predicting high-quality trimaps and alpha mattes with minimal user click inputs. Through analyzing real users' behavioral logic and characteristics of trimaps, we successfully propose a powerful iterative three-class training strategy and a dedicated simulation function, making Click2Trimap exhibit versatility across various scenarios. Quantitative and qualitative assessments on synthetic and real-world matting datasets demonstrate Click2Trimap's superior performance compared to all existing trimap-free matting methods. Especially, in the user study, Click2Trimap achieves high-quality trimap and matting predictions in just an average of 5 seconds per image, demonstrating its substantial practical value in real-world applications.

Learning Trimaps via Clicks for Image Matting

TL;DR

The paper addresses the bottleneck of manual trimap annotation in image matting by proposing Click2Trimap, an interactive model that converts sparse user clicks into high-quality trimaps and alpha mattes. It introduces an Integrative Three-class Training Strategy (ITTS) and a Conditioned Unknown Prioritized Simulation (CUPS) to simulate realistic user interactions and emphasize unknown-region recall. When integrated with trimap-based matting models, Click2Trimap achieves comparable alpha matte accuracy to manual trimaps while dramatically reducing interaction cost, including an average of ~5 seconds per image in user studies. The approach also demonstrates strong scalability to video matting and robustness across diverse object types and real-world data.

Abstract

Despite significant advancements in image matting, existing models heavily depend on manually-drawn trimaps for accurate results in natural image scenarios. However, the process of obtaining trimaps is time-consuming, lacking user-friendliness and device compatibility. This reliance greatly limits the practical application of all trimap-based matting methods. To address this issue, we introduce Click2Trimap, an interactive model capable of predicting high-quality trimaps and alpha mattes with minimal user click inputs. Through analyzing real users' behavioral logic and characteristics of trimaps, we successfully propose a powerful iterative three-class training strategy and a dedicated simulation function, making Click2Trimap exhibit versatility across various scenarios. Quantitative and qualitative assessments on synthetic and real-world matting datasets demonstrate Click2Trimap's superior performance compared to all existing trimap-free matting methods. Especially, in the user study, Click2Trimap achieves high-quality trimap and matting predictions in just an average of 5 seconds per image, demonstrating its substantial practical value in real-world applications.
Paper Structure (17 sections, 7 equations, 9 figures, 4 tables)

This paper contains 17 sections, 7 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Illustration of different interactive schemes for natural image matting. , , and refer to foreground, unknown, and background clicks, respectively. Compared with previous solutions, we proposed a new interactive scheme, i.e., learning trimaps via clicks, which simultaneously achieves both accuracy and efficiency.
  • Figure 2: Qualitative comparison between our method and the state of the art method, i.e., MatteAnythingyao2023matte, on challenging cases.
  • Figure 3: Left part represents the iterative training loop of Click2Trimap. The role of ITTS, described in right part, is to provide continuous clicks during training. We represent errors by computing false negative error maps and performing distance transform for them. Click2Trimap uses CUPS to decide the class of next click, as formulated in the flowchart.
  • Figure 4: We draw this curve by calculate the MSE of the alpha matte guided by the trimap after each click. This obvious declining trend indicates our method not only performs well in predicting trimaps, but also is capable of correcting trimaps continuously.
  • Figure 5: We test the inference time of Click2Trimap (ViT-Huge), ViT-Matte and MatteFormer at different resolutions on one A100 (40G). Even when using a relatively large backbone, Click2Trimap only requires around 80ms to infer a trimap.
  • ...and 4 more figures