Full-Duplex Strategy for Video Object Segmentation
Ge-Peng Ji, Deng-Ping Fan, Keren Fu, Zhe Wu, Jianbing Shen, Ling Shao
TL;DR
FSNet proposes a full-duplex approach to video object segmentation by enabling bidirectional cross-modal interaction between appearance and motion through Relational Cross-Attention Modules (RCAM) and Bidirectional Purification Modules (BPM). The architecture conducts cross-modal feature fusion in the encoder via RCAM and refines features in the decoder with cascaded BPMs, improving robustness to motion and appearance inconsistencies. Empirical results on DAVIS$_{16}$, MCL, FBMS, SegTrack-V2, and DAVSOD$_{19}$ demonstrate state-of-the-art unsupervised VOS and strong V-SOD performance, with notable gains in metrics such as $S_ ext{α}$, $E_ ext{ξ}^{max}$, and $F_{eta}^{max}$ and favorable data efficiency. The work provides a unified, efficient framework for both U-VOS and V-SOD, with practical inference speed and publicly available code.
Abstract
Previous video object segmentation approaches mainly focus on using simplex solutions between appearance and motion, limiting feature collaboration efficiency among and across these two cues. In this work, we study a novel and efficient full-duplex strategy network (FSNet) to address this issue, by considering a better mutual restraint scheme between motion and appearance in exploiting the cross-modal features from the fusion and decoding stage. Specifically, we introduce the relational cross-attention module (RCAM) to achieve bidirectional message propagation across embedding sub-spaces. To improve the model's robustness and update the inconsistent features from the spatial-temporal embeddings, we adopt the bidirectional purification module (BPM) after the RCAM. Extensive experiments on five popular benchmarks show that our FSNet is robust to various challenging scenarios (e.g., motion blur, occlusion) and achieves favourable performance against existing cutting-edges both in the video object segmentation and video salient object detection tasks. The project is publicly available at: https://dpfan.net/FSNet.
