SemanticFlow: A Self-Supervised Framework for Joint Scene Flow Prediction and Instance Segmentation in Dynamic Environments

Yinqi Chen; Meiying Zhang; Qi Hao; Guang Zhou

SemanticFlow: A Self-Supervised Framework for Joint Scene Flow Prediction and Instance Segmentation in Dynamic Environments

Yinqi Chen, Meiying Zhang, Qi Hao, Guang Zhou

TL;DR

SemanticFlow tackles dynamic scene understanding by jointly predicting 3D scene flow and instance segmentation from consecutive point clouds in a self-supervised, multi-task framework. It introduces a coarse-to-fine strategy with a shared backbone and a suite of interdependent losses to enforce motion-semantics consistency, reinforced by a self-supervised pseudo-labeling pipeline. Empirical results on Waymo and Argoverse 2 show competitive scene flow accuracy and enhanced segmentation metrics, with notable robustness under limited labeling. The approach demonstrates practical impact for downstream autonomous driving tasks such as SLAM, obstacle avoidance, and planning, while reducing annotation reliance.

Abstract

Accurate perception of dynamic traffic scenes is crucial for high-level autonomous driving systems, requiring robust object motion estimation and instance segmentation. However, traditional methods often treat them as separate tasks, leading to suboptimal performance, spatio-temporal inconsistencies, and inefficiency in complex scenarios due to the absence of information sharing. This paper proposes a multi-task SemanticFlow framework to simultaneously predict scene flow and instance segmentation of full-resolution point clouds. The novelty of this work is threefold: 1) developing a coarse-to-fine prediction based multi-task scheme, where an initial coarse segmentation of static backgrounds and dynamic objects is used to provide contextual information for refining motion and semantic information through a shared feature processing module; 2) developing a set of loss functions to enhance the performance of scene flow estimation and instance segmentation, while can help ensure spatial and temporal consistency of both static and dynamic objects within traffic scenes; 3) developing a self-supervised learning scheme, which utilizes coarse segmentation to detect rigid objects and compute their transformation matrices between sequential frames, enabling the generation of self-supervised labels. The proposed framework is validated on the Argoverse and Waymo datasets, demonstrating superior performance in instance segmentation accuracy, scene flow estimation, and computational efficiency, establishing a new benchmark for self-supervised methods in dynamic scene understanding.

SemanticFlow: A Self-Supervised Framework for Joint Scene Flow Prediction and Instance Segmentation in Dynamic Environments

TL;DR

Abstract

SemanticFlow: A Self-Supervised Framework for Joint Scene Flow Prediction and Instance Segmentation in Dynamic Environments

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)