POBEVM: Real-time Video Matting via Progressively Optimize the Target Body and Edge
Jianming Xian
TL;DR
POBEVM tackles real-time trimap-free video matting by separating optimization of target body and edge via the SOBE block. The network uses an encoder–decoder with attention-guided SOBE blocks and an optional Deep Guided Filter, plus Edge-L1-Loss to strengthen edge predictions. Evaluations on VM and D646 datasets show state-of-the-art edge and overall matting performance among trimap-free methods; segmentation experiments demonstrate SOBE's generality to refine edges in camouflaged-object segmentation. The method reduces reliance on manual trimaps while achieving sharper edges suitable for downstream video editing.
Abstract
Deep convolutional neural networks (CNNs) based approaches have achieved great performance in video matting. Many of these methods can produce accurate alpha estimation for the target body but typically yield fuzzy or incorrect target edges. This is usually caused by the following reasons: 1) The current methods always treat the target body and edge indiscriminately; 2) Target body dominates the whole target with only a tiny proportion target edge. For the first problem, we propose a CNN-based module that separately optimizes the matting target body and edge (SOBE). And on this basis, we introduce a real-time, trimap-free video matting method via progressively optimizing the matting target body and edge (POBEVM) that is much lighter than previous approaches and achieves significant improvements in the predicted target edge. For the second problem, we propose an Edge-L1-Loss (ELL) function that enforces our network on the matting target edge. Experiments demonstrate our method outperforms prior trimap-free matting methods on both Distinctions-646 (D646) and VideoMatte240K(VM) dataset, especially in edge optimization.
