MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation
Jintao Cheng, Xingming Chen, Jinxin Liang, Xiaoyu Tang, Xieyuanli Chen, Dachuan Li
TL;DR
MV-MOS tackles 3D moving object segmentation by fusing motion cues from dual 2D views—BEV and range view—alongside a semantic BEV branch to guide motion features. The architecture combines a multi-view motion branch, a semantic branch, and a Mamba SS2D-based adaptive fusion to synthesize rich motion-semantic features, mitigating information loss from 3D-to-2D projections. On SemanticKITTI-MOS, MV-MOS achieves state-of-the-art performance with IoU of $IoU_{val}=78.5\%$ and $IoU_{test}=80.6\%$, and ablation studies confirm the contribution of each component (dual-view fusion, semantic guidance, and Mamba fusion). The approach balances accuracy and efficiency, delivering competitive real-time performance for practical autonomous driving and robotics applications.
Abstract
Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge. In this paper, we propose a novel multi-view MOS model (MV-MOS) by fusing motion-semantic features from different 2D representations of point clouds. To effectively exploit complementary information, the motion branches of the proposed model combines motion features from both bird's eye view (BEV) and range view (RV) representations. In addition, a semantic branch is introduced to provide supplementary semantic features of moving objects. Finally, a Mamba module is utilized to fuse the semantic features with motion features and provide effective guidance for the motion branches. We validated the effectiveness of the proposed multi-branch fusion MOS framework via comprehensive experiments, and our proposed model outperforms existing state-of-the-art models on the SemanticKITTI benchmark.
