Table of Contents
Fetching ...

DART: Depth-Enhanced Accurate and Real-Time Background Matting

Hanxi Li, Guofeng Li, Bo Li, Lin Wu, Yan Cheng

TL;DR

DART tackles background matting under fixed scenes by leveraging RGB-D data to overcome RGB limitations under varying illumination and shadows. By distilling a lightweight MobileNetv2 from a ResNet50-based BGMv2 base, applying Bayesian depth-informed error correction, refining alpha with RGB-D patches, and optionally performing depth-informed post-matting with ViTMatte, DART achieves state-of-the-art accuracy with real-time performance (up to 125 FPS on desktop GPUs and 33 FPS on edge devices). A dedicated RGB-D background matting dataset (JXNU-RGBD) supports training and evaluation for this task. The work demonstrates tangible improvements in matting quality and efficiency, enabling practical deployment for applications like live webcasting and mobile photo editing, and establishes a new benchmark for RGB-D background matting research.

Abstract

Matting with a static background, often referred to as ``Background Matting" (BGM), has garnered significant attention within the computer vision community due to its pivotal role in various practical applications like webcasting and photo editing. Nevertheless, achieving highly accurate background matting remains a formidable challenge, primarily owing to the limitations inherent in conventional RGB images. These limitations manifest in the form of susceptibility to varying lighting conditions and unforeseen shadows. In this paper, we leverage the rich depth information provided by the RGB-Depth (RGB-D) cameras to enhance background matting performance in real-time, dubbed DART. Firstly, we adapt the original RGB-based BGM algorithm to incorporate depth information. The resulting model's output undergoes refinement through Bayesian inference, incorporating a background depth prior. The posterior prediction is then translated into a "trimap," which is subsequently fed into a state-of-the-art matting algorithm to generate more precise alpha mattes. To ensure real-time matting capabilities, a critical requirement for many real-world applications, we distill the backbone of our model from a larger and more versatile BGM network. Our experiments demonstrate the superior performance of the proposed method. Moreover, thanks to the distillation operation, our method achieves a remarkable processing speed of 33 frames per second (fps) on a mid-range edge-computing device. This high efficiency underscores DART's immense potential for deployment in mobile applications}

DART: Depth-Enhanced Accurate and Real-Time Background Matting

TL;DR

DART tackles background matting under fixed scenes by leveraging RGB-D data to overcome RGB limitations under varying illumination and shadows. By distilling a lightweight MobileNetv2 from a ResNet50-based BGMv2 base, applying Bayesian depth-informed error correction, refining alpha with RGB-D patches, and optionally performing depth-informed post-matting with ViTMatte, DART achieves state-of-the-art accuracy with real-time performance (up to 125 FPS on desktop GPUs and 33 FPS on edge devices). A dedicated RGB-D background matting dataset (JXNU-RGBD) supports training and evaluation for this task. The work demonstrates tangible improvements in matting quality and efficiency, enabling practical deployment for applications like live webcasting and mobile photo editing, and establishes a new benchmark for RGB-D background matting research.

Abstract

Matting with a static background, often referred to as ``Background Matting" (BGM), has garnered significant attention within the computer vision community due to its pivotal role in various practical applications like webcasting and photo editing. Nevertheless, achieving highly accurate background matting remains a formidable challenge, primarily owing to the limitations inherent in conventional RGB images. These limitations manifest in the form of susceptibility to varying lighting conditions and unforeseen shadows. In this paper, we leverage the rich depth information provided by the RGB-Depth (RGB-D) cameras to enhance background matting performance in real-time, dubbed DART. Firstly, we adapt the original RGB-based BGM algorithm to incorporate depth information. The resulting model's output undergoes refinement through Bayesian inference, incorporating a background depth prior. The posterior prediction is then translated into a "trimap," which is subsequently fed into a state-of-the-art matting algorithm to generate more precise alpha mattes. To ensure real-time matting capabilities, a critical requirement for many real-world applications, we distill the backbone of our model from a larger and more versatile BGM network. Our experiments demonstrate the superior performance of the proposed method. Moreover, thanks to the distillation operation, our method achieves a remarkable processing speed of 33 frames per second (fps) on a mid-range edge-computing device. This high efficiency underscores DART's immense potential for deployment in mobile applications}
Paper Structure (13 sections, 13 equations, 4 figures, 2 tables)

This paper contains 13 sections, 13 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The illustration on the proposed depth-enhanced accurate and real-time backgournd matting (DART). Left: The conventional BGM and the optional post-matting process. Right: our depth-based enhancement approach.
  • Figure 2: The workflow of the proposed DART algorithm. The three stages of DART are shown in the gray, purple, and green regions. Better view in color.
  • Figure 3: The proposed RGB-D background matting dataset. $8$ scenes are illustrated here, each with one RGB image pair (test and background) and one depth map pair. The dark blue pixels of the depth map stand for the depth-unknown region. Better view in color.
  • Figure 4: Speed and accuracy comparison of involved matting methods. Better view in color.