MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation

Jintao Cheng; Xingming Chen; Jinxin Liang; Xiaoyu Tang; Xieyuanli Chen; Dachuan Li

MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation

Jintao Cheng, Xingming Chen, Jinxin Liang, Xiaoyu Tang, Xieyuanli Chen, Dachuan Li

TL;DR

MV-MOS tackles 3D moving object segmentation by fusing motion cues from dual 2D views—BEV and range view—alongside a semantic BEV branch to guide motion features. The architecture combines a multi-view motion branch, a semantic branch, and a Mamba SS2D-based adaptive fusion to synthesize rich motion-semantic features, mitigating information loss from 3D-to-2D projections. On SemanticKITTI-MOS, MV-MOS achieves state-of-the-art performance with IoU of $IoU_{val}=78.5\%$ and $IoU_{test}=80.6\%$, and ablation studies confirm the contribution of each component (dual-view fusion, semantic guidance, and Mamba fusion). The approach balances accuracy and efficiency, delivering competitive real-time performance for practical autonomous driving and robotics applications.

Abstract

Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge. In this paper, we propose a novel multi-view MOS model (MV-MOS) by fusing motion-semantic features from different 2D representations of point clouds. To effectively exploit complementary information, the motion branches of the proposed model combines motion features from both bird's eye view (BEV) and range view (RV) representations. In addition, a semantic branch is introduced to provide supplementary semantic features of moving objects. Finally, a Mamba module is utilized to fuse the semantic features with motion features and provide effective guidance for the motion branches. We validated the effectiveness of the proposed multi-branch fusion MOS framework via comprehensive experiments, and our proposed model outperforms existing state-of-the-art models on the SemanticKITTI benchmark.

MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation

TL;DR

and

, and ablation studies confirm the contribution of each component (dual-view fusion, semantic guidance, and Mamba fusion). The approach balances accuracy and efficiency, delivering competitive real-time performance for practical autonomous driving and robotics applications.

Abstract

Paper Structure (16 sections, 20 equations, 4 figures, 3 tables)

This paper contains 16 sections, 20 equations, 4 figures, 3 tables.

Introduction
Related Work
Methodology
Data Prepossessing
Network Structure
Motion Branch Structure Based on Muti-View Residual Map Fusion
Semantic Branch Structure Based on BEV Perspective Projection
Density-aware Adaptive Feature Fusion Module
Loss Function
Experiments
Experiments Setups
Evaluation Results and Comparisons
Ablation Studies
Qualitative Analysis
Computational Efficiency
...and 1 more sections

Figures (4)

Figure 1: Upper: Comparison of 3D moving object results from our proposed MV-MOS (upper left) and the baseline MotionBEV 2024MotionBEV (upper right), the segmented moving objects and incorrect segmentation (e.g. parked cars that are identified as moving cars) are colored in blue and red, respectively. Lower: Comparison of performance of MOS models with different branch design on the SemanticKITTI-MOS benchmark, our proposed multi-view fusion model achieves the highest IoU.
Figure 2: Overview of the proposed MV-MOS framework. In the motion branch, the motion information of moving objects are derived by fusing two residual map features from the BEV and range view representations of LiDAR point clouds. The semantic branch extracts rich appearance features that supplement and guide the motion branch. The Mamba-based feature fusion module generates the synthesized features for the prediction of the final output.
Figure 3: Structure of the proposed adaptive feature fusion module.
Figure 4: Qualitative moving object segmentation results of different models on the SemanticKITTI validation set. True positive, false negative, and false positive segmented points are colored in blue, green, and red, respectively. Incorrect segmentation results and missed moving objects are also highlighted in red and green circles, respectively (better view with color and magnification).

MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation

TL;DR

Abstract

MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)