Table of Contents
Fetching ...

RPM-Net: Recurrent Prediction of Motion and Parts from Point Cloud

Zihao Yan, Ruizhen Hu, Xingguang Yan, Luanmin Chen, Oliver van Kaick, Hao Zhang, Hui Huang

TL;DR

<3-5 sentence high-level summary> RPM-Net addresses the problem of inferring movable parts and predicting their motions from a single unsegmented 3D point cloud. It integrates an encoder–decoder RNN with interleaved LSTMs (RPM-Net) to hallucinate dense per-point displacement sequences and perform motion-based segmentation, and a separate Mobility-Net to estimate high-level motion parameters from the predicted sequence. The approach handles partial scans and supports hierarchical motion through recursive application, with quantitative and qualitative results showing improvements over prior methods like Shape2Motion. Limitations include ambiguities in geometry and data scarcity, motivating future work on pose-invariant training and richer motion datasets.

Abstract

We introduce RPM-Net, a deep learning-based approach which simultaneously infers movable parts and hallucinates their motions from a single, un-segmented, and possibly partial, 3D point cloud shape. RPM-Net is a novel Recurrent Neural Network (RNN), composed of an encoder-decoder pair with interleaved Long Short-Term Memory (LSTM) components, which together predict a temporal sequence of pointwise displacements for the input point cloud. At the same time, the displacements allow the network to learn movable parts, resulting in a motion-based shape segmentation. Recursive applications of RPM-Net on the obtained parts can predict finer-level part motions, resulting in a hierarchical object segmentation. Furthermore, we develop a separate network to estimate part mobilities, e.g., per-part motion parameters, from the segmented motion sequence. Both networks learn deep predictive models from a training set that exemplifies a variety of mobilities for diverse objects. We show results of simultaneous motion and part predictions from synthetic and real scans of 3D objects exhibiting a variety of part mobilities, possibly involving multiple movable parts.

RPM-Net: Recurrent Prediction of Motion and Parts from Point Cloud

TL;DR

<3-5 sentence high-level summary> RPM-Net addresses the problem of inferring movable parts and predicting their motions from a single unsegmented 3D point cloud. It integrates an encoder–decoder RNN with interleaved LSTMs (RPM-Net) to hallucinate dense per-point displacement sequences and perform motion-based segmentation, and a separate Mobility-Net to estimate high-level motion parameters from the predicted sequence. The approach handles partial scans and supports hierarchical motion through recursive application, with quantitative and qualitative results showing improvements over prior methods like Shape2Motion. Limitations include ambiguities in geometry and data scarcity, motivating future work on pose-invariant training and richer motion datasets.

Abstract

We introduce RPM-Net, a deep learning-based approach which simultaneously infers movable parts and hallucinates their motions from a single, un-segmented, and possibly partial, 3D point cloud shape. RPM-Net is a novel Recurrent Neural Network (RNN), composed of an encoder-decoder pair with interleaved Long Short-Term Memory (LSTM) components, which together predict a temporal sequence of pointwise displacements for the input point cloud. At the same time, the displacements allow the network to learn movable parts, resulting in a motion-based shape segmentation. Recursive applications of RPM-Net on the obtained parts can predict finer-level part motions, resulting in a hierarchical object segmentation. Furthermore, we develop a separate network to estimate part mobilities, e.g., per-part motion parameters, from the segmented motion sequence. Both networks learn deep predictive models from a training set that exemplifies a variety of mobilities for diverse objects. We show results of simultaneous motion and part predictions from synthetic and real scans of 3D objects exhibiting a variety of part mobilities, possibly involving multiple movable parts.

Paper Structure

This paper contains 41 sections, 15 equations, 18 figures, 4 tables.

Figures (18)

  • Figure 1: Given an unsegmented, possibly partial, point cloud shape, our deep recurrent neural network, RPM-Net, simultaneously hallucinates a motion sequence (via point-wise displacements) and infers a motion-based segmentation of the shape into, possibly multiple, movable parts. RPM-Net predicts a non-trivial motion for the umbrella and multi-part motions for both the cabinet (drawer sliding and door rotating) and the office chair (seat moving up and wheels rotating). The umbrella and cabinet are synthetic scans while the office chair is a single-view scan acquired with a Kinect sensor. Input scans to RPM-Net were downsampled to 2,048 points.
  • Figure 2: Training data generation. For each mobility unit, we sample $n$ frames from the start to the end of the motion and compute the displacement field between each pair of adjacent frames. In this example, we see the sampling of the rotational motion of an electric fan, where the rotation angle range is defined to be $[0^{\circ}, 120^{\circ}]$ due to the rotational symmetry of the shape.
  • Figure 3: The architecture of the motion hallucination network RPM-Net. Given an input point cloud $\mathcal{P}_0$, the network predicts displacement maps $\{ \mathcal{D}_t \}$ along with the segmentation $\mathcal{S}$ of the point cloud, which together provide the final segmented motion sequence $\{ \mathcal{P}_t^{\mathcal{S}}\}$. The network is composed of set abstractions (SA), feature propagations (FP), LSTM units, fully-connected layers (FC), and special operations denoted with the pink circles.
  • Figure 4: The architecture of the mobility prediction network Mobility-Net. For each segmented moving component $\mathcal{S}_{mov}^{i}$, the network takes the point cloud $\mathcal{P}_0$ and the corresponding generated displacement maps $\{ \mathcal{D}_t^i\}$ as input to predict the high-level mobility parameters $(\tau_i, d_i, x_i)$. The network is composed of an encoder (SA) and fully-connected layers (FC).
  • Figure 5: Motion prediction results on shapes with a single moving part. We observe how our method can be applied to a variety of shapes with diverse mobilities, including both complete point clouds and partial scans. For each input cloud, we show the first four frames of the predicted motion, along with the predicted transformation axis drawn as a green line, and moving and reference parts colored red and gray, respectively. We observe how RPM-Net can predict the correct motion sequences for different inputs and estimate the corresponding part mobility parameters.
  • ...and 13 more figures