Table of Contents
Fetching ...

SSFlowNet: Semi-supervised Scene Flow Estimation On Point Clouds With Pseudo Label

Jingze Chen, Junfeng Yao, Qiqin Lin, Rongzhou Zhou, Lei Li

TL;DR

SSFlowNet addresses the high labeling cost of 3D scene flow on point clouds by introducing a semi-supervised framework that generates high-quality pseudo-labels through a correlation-matrix guided propagation and a spatial memory module. A Flow-Graph Encoder constructs a geometric graph to learn cross-frame similarities, while a correlation matrix maps labeled to unlabeled points to refine flow estimates. The training objective combines Chamfer loss with a weighted smoothness term, enabling robust learning from sparse labels and unlabeled data. Experimental results on FlyingThings3D and KITTI demonstrate improved pseudo-label quality and competitive performance with substantially reduced labeling effort, signaling strong practical potential for autonomous driving and SLAM tasks.

Abstract

In the domain of supervised scene flow estimation, the process of manual labeling is both time-intensive and financially demanding. This paper introduces SSFlowNet, a semi-supervised approach for scene flow estimation, that utilizes a blend of labeled and unlabeled data, optimizing the balance between the cost of labeling and the precision of model training. SSFlowNet stands out through its innovative use of pseudo-labels, mainly reducing the dependency on extensively labeled datasets while maintaining high model accuracy. The core of our model is its emphasis on the intricate geometric structures of point clouds, both locally and globally, coupled with a novel spatial memory feature. This feature is adept at learning the geometric relationships between points over sequential time frames. By identifying similarities between labeled and unlabeled points, SSFlowNet dynamically constructs a correlation matrix to evaluate scene flow dependencies at individual point level. Furthermore, the integration of a flow consistency module within SSFlowNet enhances its capability to consistently estimate flow, an essential aspect for analyzing dynamic scenes. Empirical results demonstrate that SSFlowNet surpasses existing methods in pseudo-label generation and shows adaptability across varying data volumes. Moreover, our semi-supervised training technique yields promising outcomes even with different smaller ratio labeled data, marking a substantial advancement in the field of scene flow estimation.

SSFlowNet: Semi-supervised Scene Flow Estimation On Point Clouds With Pseudo Label

TL;DR

SSFlowNet addresses the high labeling cost of 3D scene flow on point clouds by introducing a semi-supervised framework that generates high-quality pseudo-labels through a correlation-matrix guided propagation and a spatial memory module. A Flow-Graph Encoder constructs a geometric graph to learn cross-frame similarities, while a correlation matrix maps labeled to unlabeled points to refine flow estimates. The training objective combines Chamfer loss with a weighted smoothness term, enabling robust learning from sparse labels and unlabeled data. Experimental results on FlyingThings3D and KITTI demonstrate improved pseudo-label quality and competitive performance with substantially reduced labeling effort, signaling strong practical potential for autonomous driving and SLAM tasks.

Abstract

In the domain of supervised scene flow estimation, the process of manual labeling is both time-intensive and financially demanding. This paper introduces SSFlowNet, a semi-supervised approach for scene flow estimation, that utilizes a blend of labeled and unlabeled data, optimizing the balance between the cost of labeling and the precision of model training. SSFlowNet stands out through its innovative use of pseudo-labels, mainly reducing the dependency on extensively labeled datasets while maintaining high model accuracy. The core of our model is its emphasis on the intricate geometric structures of point clouds, both locally and globally, coupled with a novel spatial memory feature. This feature is adept at learning the geometric relationships between points over sequential time frames. By identifying similarities between labeled and unlabeled points, SSFlowNet dynamically constructs a correlation matrix to evaluate scene flow dependencies at individual point level. Furthermore, the integration of a flow consistency module within SSFlowNet enhances its capability to consistently estimate flow, an essential aspect for analyzing dynamic scenes. Empirical results demonstrate that SSFlowNet surpasses existing methods in pseudo-label generation and shows adaptability across varying data volumes. Moreover, our semi-supervised training technique yields promising outcomes even with different smaller ratio labeled data, marking a substantial advancement in the field of scene flow estimation.
Paper Structure (19 sections, 11 equations, 7 figures, 3 tables)

This paper contains 19 sections, 11 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The SSFlowNet model dynamically learns point-level similarities by aggregating geometric features across two frames, utilizing global pseudo-labels derived from known true labels to enhance training efficiency.
  • Figure 2: The overview of proposed SSLFlownet. (i) We start by simply up-sampling the ground truth labels to obtain the coarse flow. (ii) Subsequently, we extract neighboring features at time $t-1$ and assimilate the features at time $t$ through the application of the spatial memory module. (iii) Finally fuse and sum the points based on their features and input to MLP to compute the correlation matrix to generate the pseudo labels.
  • Figure 3: Details about the feature encoder and on-stream sampling. (a) Our feature encoder is divided into two parts: the first extracts the current features, while the second part utilizes spatial memory to save the point features for the subsequent moment and then re-extracts these features. Subsequently, these edge features are processed through the set convolutional layers. (b) The input to the MLP includes the difference in point coordinates as well as their features, with the output being the correlation matrix. This matrix represents the influence factor of each labeled point on the $p_i$ points.
  • Figure 4: Results on FlyingThings3D (top) and KITTI (bottom). The red points represent the points of $Q$, while the green points denote the points wrapped through pseudo labels. The figure shows that our pseudo label generation model can approximate ground truth labels.
  • Figure 5: The results of our method compared with the ground truth label Fig. a, and the noisy label Fig. b are shown in abvoe. The red points shows the actual $Q$ , the green points shows the $P$ wrapped by generated labels, and the blue part shows the outliner points.
  • ...and 2 more figures