Table of Contents
Fetching ...

Object Dynamics Modeling with Hierarchical Point Cloud-based Representations

Chanho Kim, Li Fuxin

TL;DR

This work addresses object-dynamics prediction in 3D by introducing a point-based, geometry-aware neural network that operates on dense point clouds and meshes. It combines Object PointConv for within-object force propagation with Relational PointConv for inter-object interactions inside a hierarchical U-Net, and extends to mesh data by computing interaction points on faces. The approach yields state-of-the-art results on gravity- and collision-centric tasks, outperforming graph neural network baselines on Physion and Kubric datasets, with notable gains in non-rigid drape scenarios and when sampling densely on surfaces. By leveraging continuous point convolutions and an object-centric hierarchy, the method provides accurate, scalable dynamics modeling that can bridge advancements in point-cloud and graph-based physics learning.

Abstract

Modeling object dynamics with a neural network is an important problem with numerous applications. Most recent work has been based on graph neural networks. However, physics happens in 3D space, where geometric information potentially plays an important role in modeling physical phenomena. In this work, we propose a novel U-net architecture based on continuous point convolution which naturally embeds information from 3D coordinates and allows for multi-scale feature representations with established downsampling and upsampling procedures. Bottleneck layers in the downsampled point clouds lead to better long-range interaction modeling. Besides, the flexibility of point convolutions allows our approach to generalize to sparsely sampled points from mesh vertices and dynamically generate features on important interaction points on mesh faces. Experimental results demonstrate that our approach significantly improves the state-of-the-art, especially in scenarios that require accurate gravity or collision reasoning.

Object Dynamics Modeling with Hierarchical Point Cloud-based Representations

TL;DR

This work addresses object-dynamics prediction in 3D by introducing a point-based, geometry-aware neural network that operates on dense point clouds and meshes. It combines Object PointConv for within-object force propagation with Relational PointConv for inter-object interactions inside a hierarchical U-Net, and extends to mesh data by computing interaction points on faces. The approach yields state-of-the-art results on gravity- and collision-centric tasks, outperforming graph neural network baselines on Physion and Kubric datasets, with notable gains in non-rigid drape scenarios and when sampling densely on surfaces. By leveraging continuous point convolutions and an object-centric hierarchy, the method provides accurate, scalable dynamics modeling that can bridge advancements in point-cloud and graph-based physics learning.

Abstract

Modeling object dynamics with a neural network is an important problem with numerous applications. Most recent work has been based on graph neural networks. However, physics happens in 3D space, where geometric information potentially plays an important role in modeling physical phenomena. In this work, we propose a novel U-net architecture based on continuous point convolution which naturally embeds information from 3D coordinates and allows for multi-scale feature representations with established downsampling and upsampling procedures. Bottleneck layers in the downsampled point clouds lead to better long-range interaction modeling. Besides, the flexibility of point convolutions allows our approach to generalize to sparsely sampled points from mesh vertices and dynamically generate features on important interaction points on mesh faces. Experimental results demonstrate that our approach significantly improves the state-of-the-art, especially in scenarios that require accurate gravity or collision reasoning.
Paper Structure (27 sections, 8 equations, 4 figures, 5 tables)

This paper contains 27 sections, 8 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: We propose a point-based convolutional neural network that is capable of learning object dynamics. Two different types of convolution operations, Object PointConv and Relational PointConv, are utilized alternatively to model force propagation within the same object and across different objects, respectively. A U-Net architecture encodes the point cloud into a smaller point cloud to capture long-term interactions and then decodes back to the original point cloud to make predictions. Point-based continuous convolution allows the proposed model to be compatible with both point cloud and mesh inputs with minor modifications.
  • Figure 2: (a) As pointed out by allen2022learning, given two mesh faces of different objects, a neighborhood based on mesh vertices (dotted lines) may not capture proximity between two mesh faces (red solid line), depending on the location where a collision occurs. (b) We model face-to-face collision with three PointConv layers using dynamically selected interaction points (red dots) on the surfaces.
  • Figure 3: The proposed U-Net architecture. The input point cloud goes through Object PointConv and Relational PointConv alternatively, with successive downsampling layers in the encoding stage. In the bottleneck layers, the voxel sizes are large and the number of points is small; hence, long-range interactions are captured with several layers. Finally, in the decoder, Object PointConv with interpolation upsamples the point clouds to the point locations at the previous level. Finally, the point cloud is upsampled back to the original size, and then pointwise velocity or acceleration is predicted. We selected $32$ as the base channel dimension $C$ for our experiments in this paper.
  • Figure 4: The proposed U-Net architecture for mesh inputs. Unlike its counterpart designed for dense point cloud inputs, we only utilize relational PointConv at the highest resolutions, as some object shapes might not be accurately represented with fewer mesh vertices. In this U-net architecture, the downsampling layers only have object PointConv that propagates collision effects over the mesh vertices of each object.