DART: Depth-Enhanced Accurate and Real-Time Background Matting
Hanxi Li, Guofeng Li, Bo Li, Lin Wu, Yan Cheng
TL;DR
DART tackles background matting under fixed scenes by leveraging RGB-D data to overcome RGB limitations under varying illumination and shadows. By distilling a lightweight MobileNetv2 from a ResNet50-based BGMv2 base, applying Bayesian depth-informed error correction, refining alpha with RGB-D patches, and optionally performing depth-informed post-matting with ViTMatte, DART achieves state-of-the-art accuracy with real-time performance (up to 125 FPS on desktop GPUs and 33 FPS on edge devices). A dedicated RGB-D background matting dataset (JXNU-RGBD) supports training and evaluation for this task. The work demonstrates tangible improvements in matting quality and efficiency, enabling practical deployment for applications like live webcasting and mobile photo editing, and establishes a new benchmark for RGB-D background matting research.
Abstract
Matting with a static background, often referred to as ``Background Matting" (BGM), has garnered significant attention within the computer vision community due to its pivotal role in various practical applications like webcasting and photo editing. Nevertheless, achieving highly accurate background matting remains a formidable challenge, primarily owing to the limitations inherent in conventional RGB images. These limitations manifest in the form of susceptibility to varying lighting conditions and unforeseen shadows. In this paper, we leverage the rich depth information provided by the RGB-Depth (RGB-D) cameras to enhance background matting performance in real-time, dubbed DART. Firstly, we adapt the original RGB-based BGM algorithm to incorporate depth information. The resulting model's output undergoes refinement through Bayesian inference, incorporating a background depth prior. The posterior prediction is then translated into a "trimap," which is subsequently fed into a state-of-the-art matting algorithm to generate more precise alpha mattes. To ensure real-time matting capabilities, a critical requirement for many real-world applications, we distill the backbone of our model from a larger and more versatile BGM network. Our experiments demonstrate the superior performance of the proposed method. Moreover, thanks to the distillation operation, our method achieves a remarkable processing speed of 33 frames per second (fps) on a mid-range edge-computing device. This high efficiency underscores DART's immense potential for deployment in mobile applications}
