Table of Contents
Fetching ...

Localized Graph-Based Neural Dynamics Models for Terrain Manipulation

Chaoqi Liu, Yunzhu Li, Kris Hauser

TL;DR

Terrain manipulation requires accurate predictive models for high-dimensional, deformable terrains. The authors propose Localized Graph-Based Neural Dynamics (L-GBND) that learns a RoI proposer and RoI-aware dynamics on a large particle graph, augmented with boundary-aware node features. The forward model follows $\hat{x}_{t+1} = f(x_t, u_t)$, with computation restricted to the RoI to enable orders-of-magnitude speedups and reduced memory usage while preserving accuracy; planning uses MPPI to select trajectories toward a target heightmap. The approach is validated in simulation and real-world experiments on excavation and shaping tasks across materials, demonstrating strong sim-to-real transfer and scalable planning for terrain manipulation.

Abstract

Predictive models can be particularly helpful for robots to effectively manipulate terrains in construction sites and extraterrestrial surfaces. However, terrain state representations become extremely high-dimensional especially to capture fine-resolution details and when depth is unknown or unbounded. This paper introduces a learning-based approach for terrain dynamics modeling and manipulation, leveraging the Graph-based Neural Dynamics (GBND) framework to represent terrain deformation as motion of a graph of particles. Based on the principle that the moving portion of a terrain is usually localized, our approach builds a large terrain graph (potentially millions of particles) but only identifies a very small active subgraph (hundreds of particles) for predicting the outcomes of robot-terrain interaction. To minimize the size of the active subgraph we introduce a learning-based approach that identifies a small region of interest (RoI) based on the robot's control inputs and the current scene. We also introduce a novel domain boundary feature encoding that allows GBNDs to perform accurate dynamics prediction in the RoI interior while avoiding particle penetration through RoI boundaries. Our proposed method is both orders of magnitude faster than naive GBND and it achieves better overall prediction accuracy. We further evaluated our framework on excavation and shaping tasks on terrain with different granularity.

Localized Graph-Based Neural Dynamics Models for Terrain Manipulation

TL;DR

Terrain manipulation requires accurate predictive models for high-dimensional, deformable terrains. The authors propose Localized Graph-Based Neural Dynamics (L-GBND) that learns a RoI proposer and RoI-aware dynamics on a large particle graph, augmented with boundary-aware node features. The forward model follows , with computation restricted to the RoI to enable orders-of-magnitude speedups and reduced memory usage while preserving accuracy; planning uses MPPI to select trajectories toward a target heightmap. The approach is validated in simulation and real-world experiments on excavation and shaping tasks across materials, demonstrating strong sim-to-real transfer and scalable planning for terrain manipulation.

Abstract

Predictive models can be particularly helpful for robots to effectively manipulate terrains in construction sites and extraterrestrial surfaces. However, terrain state representations become extremely high-dimensional especially to capture fine-resolution details and when depth is unknown or unbounded. This paper introduces a learning-based approach for terrain dynamics modeling and manipulation, leveraging the Graph-based Neural Dynamics (GBND) framework to represent terrain deformation as motion of a graph of particles. Based on the principle that the moving portion of a terrain is usually localized, our approach builds a large terrain graph (potentially millions of particles) but only identifies a very small active subgraph (hundreds of particles) for predicting the outcomes of robot-terrain interaction. To minimize the size of the active subgraph we introduce a learning-based approach that identifies a small region of interest (RoI) based on the robot's control inputs and the current scene. We also introduce a novel domain boundary feature encoding that allows GBNDs to perform accurate dynamics prediction in the RoI interior while avoiding particle penetration through RoI boundaries. Our proposed method is both orders of magnitude faster than naive GBND and it achieves better overall prediction accuracy. We further evaluated our framework on excavation and shaping tasks on terrain with different granularity.

Paper Structure

This paper contains 15 sections, 6 equations, 9 figures, 3 tables, 2 algorithms.

Figures (9)

  • Figure 1: Evaluation platforms. Our terrain manipulation system includes a UR5e robotic arm equipped with an overhead RGB-D camera. We are also extending this system to a mobile scooping platform for in-the-wild studies.
  • Figure 2: System overview. The proposed dynamics model is shown in the central block: scene particles and the control input are fed into the RoI proposer, which selects particles likely to move based on the interaction caused by the control input. This is followed by a rollout using GBND. When integrated with planning, multiple trajectories are sampled and rolled out in parallel, and the best trajectory, determined by a pre-defined metric, is executed.
  • Figure 3: Visualization of a typical scooping simulation. Our method (left) only predicts the dynamics of the highlighted particles in the predicted RoI. Predictions are computed for an order of magnitude fewer particles as the full graph (right) while retaining similar accuracy.
  • Figure 4: Rollouts (left to right) from a learned 2D CNN heightmap-based dynamics model compared to L-GBND and ground truth. In the CNN, fine-grained features are lost due to the compressed representation and smoothing bias of CNNs. In contrast, L-GBND preserves volumetric structure and local interactions for 37% lower prediction error on a test dataset.
  • Figure 5: Comparing our L-GBND against GBND and geometric region proposers with different size (Geo-X). Our method demonstrates significant advantages in speed and GPU memory. $\approx$ 3,000 particles and batch size 256 are used in these experiments, per sample measurements are reported (i.e., divided by 256). Colors and labels are shared by both figures.
  • ...and 4 more figures