Table of Contents
Fetching ...

MF-MOS: A Motion-Focused Model for Moving Object Segmentation

Jintao Cheng, Kang Zeng, Zhuoxu Huang, Xiaoyu Tang, Jin Wu, Chengxi Zhang, Xieyuanli Chen, Rui Fan

TL;DR

MF-MOS introduces a motion-focused, dual-branch LiDAR MOS framework that decouples spatial-temporal motion cues (via residual maps) from semantic guidance (via range images). Key components include the Strip Average Pooling Layer (SAPL) for cross-branch fusion, a 3D Spatial-Guided Information Enhancement Module (SIEM) to refine sparse point-cloud signals, and a distribution-based data augmentation scheme over residual maps. The approach achieves state-of-the-art performance on SemanticKITTI-MOS (IoU up to 76.7% on the test set) and demonstrates strong generalization on Apollo, with ablations confirming the contribution of each module. The work advances real-time, robust MOS by effectively leveraging motion information while preserving semantic context, offering practical benefits for autonomous driving perception systems.

Abstract

Moving object segmentation (MOS) provides a reliable solution for detecting traffic participants and thus is of great interest in the autonomous driving field. Dynamic capture is always critical in the MOS problem. Previous methods capture motion features from the range images directly. Differently, we argue that the residual maps provide greater potential for motion information, while range images contain rich semantic guidance. Based on this intuition, we propose MF-MOS, a novel motion-focused model with a dual-branch structure for LiDAR moving object segmentation. Novelly, we decouple the spatial-temporal information by capturing the motion from residual maps and generating semantic features from range images, which are used as movable object guidance for the motion branch. Our straightforward yet distinctive solution can make the most use of both range images and residual maps, thus greatly improving the performance of the LiDAR-based MOS task. Remarkably, our MF-MOS achieved a leading IoU of 76.7% on the MOS leaderboard of the SemanticKITTI dataset upon submission, demonstrating the current state-of-the-art performance. The implementation of our MF-MOS has been released at https://github.com/SCNU-RISLAB/MF-MOS.

MF-MOS: A Motion-Focused Model for Moving Object Segmentation

TL;DR

MF-MOS introduces a motion-focused, dual-branch LiDAR MOS framework that decouples spatial-temporal motion cues (via residual maps) from semantic guidance (via range images). Key components include the Strip Average Pooling Layer (SAPL) for cross-branch fusion, a 3D Spatial-Guided Information Enhancement Module (SIEM) to refine sparse point-cloud signals, and a distribution-based data augmentation scheme over residual maps. The approach achieves state-of-the-art performance on SemanticKITTI-MOS (IoU up to 76.7% on the test set) and demonstrates strong generalization on Apollo, with ablations confirming the contribution of each module. The work advances real-time, robust MOS by effectively leveraging motion information while preserving semantic context, offering practical benefits for autonomous driving perception systems.

Abstract

Moving object segmentation (MOS) provides a reliable solution for detecting traffic participants and thus is of great interest in the autonomous driving field. Dynamic capture is always critical in the MOS problem. Previous methods capture motion features from the range images directly. Differently, we argue that the residual maps provide greater potential for motion information, while range images contain rich semantic guidance. Based on this intuition, we propose MF-MOS, a novel motion-focused model with a dual-branch structure for LiDAR moving object segmentation. Novelly, we decouple the spatial-temporal information by capturing the motion from residual maps and generating semantic features from range images, which are used as movable object guidance for the motion branch. Our straightforward yet distinctive solution can make the most use of both range images and residual maps, thus greatly improving the performance of the LiDAR-based MOS task. Remarkably, our MF-MOS achieved a leading IoU of 76.7% on the MOS leaderboard of the SemanticKITTI dataset upon submission, demonstrating the current state-of-the-art performance. The implementation of our MF-MOS has been released at https://github.com/SCNU-RISLAB/MF-MOS.
Paper Structure (18 sections, 7 equations, 6 figures, 6 tables)

This paper contains 18 sections, 7 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Core idea of the proposed motion-focused model. The blue parts in (a) represent the point cloud of movable objects and the red parts in (b) represent the point cloud of moving objects. The moving objects are usually a subset of movable objects. Our MF-MOS emphasizes motion information (via residual maps) and utilizes movable features (via the range image) to provide semantic enhancement.
  • Figure 2: The overall of MF-MOS is a dual-input-dual-output branching structure. The semantic branch (the bottom one) which takes the range image as input is used to predict movable objects in the current frame, and the motion branch (the upper one) takes the residual maps as input to predict the moving objects. The intermediate feature maps obtained from the encoder of the semantic branch are fused into the motion branch through the MGA module. To obtain further refined segmentation results, we use the output of the the motion branch as the input of the SIEM to obtain the final point cloud segmentation results.
  • Figure 3: Enhancing 3D Spatial Information with the SGB. The SGB partitions and enriches features across dimensions before fusion, aiming to distill insights from sparse point clouds.
  • Figure 4: Illustration of the SIEM. The process involves voxelization of the initial feature map, followed by SGB and Devoxelization. The resulting output is fused with the Point-MLP output and classified.
  • Figure 5: $K$-frames residual maps using different frame stride $\Delta{t}$. The red-boxed region shows residual feature responses correspondence to the different moving speeds of objects. A larger $\Delta{t}$ corresponds to slower-moving objects. Here we show results from eight-frame residual maps.
  • ...and 1 more figures