Table of Contents
Fetching ...

Gaussian Splatting Visual MPC for Granular Media Manipulation

Wei-Cheng Tseng, Ellina Zhang, Krishna Murthy Jatavallabhula, Florian Shkurti

TL;DR

This work introduces Gaussian splatting as a latent representation for granular material manipulation and pairs it with a learned visual dynamics model implemented as a graph neural network. By performing gradient-based model-predictive control over the Gaussian-splat state and planning with a density-field objective, the approach achieves efficient, precise manipulation of granular piles in both simulation and real-world settings. The method demonstrates strong reconstruction and short-horizon predictive accuracy, outperforms baselines on manipulation tasks, and shows zero-shot transfer to new environments with differing particle shapes. Overall, the paper highlights Gaussian splatting as a high-signal, physics-informed representation that enhances planning and control for complex, non-rigid materials.

Abstract

Recent advancements in learned 3D representations have enabled significant progress in solving complex robotic manipulation tasks, particularly for rigid-body objects. However, manipulating granular materials such as beans, nuts, and rice, remains challenging due to the intricate physics of particle interactions, high-dimensional and partially observable state, inability to visually track individual particles in a pile, and the computational demands of accurate dynamics prediction. Current deep latent dynamics models often struggle to generalize in granular material manipulation due to a lack of inductive biases. In this work, we propose a novel approach that learns a visual dynamics model over Gaussian splatting representations of scenes and leverages this model for manipulating granular media via Model-Predictive Control. Our method enables efficient optimization for complex manipulation tasks on piles of granular media. We evaluate our approach in both simulated and real-world settings, demonstrating its ability to solve unseen planning tasks and generalize to new environments in a zero-shot transfer. We also show significant prediction and manipulation performance improvements compared to existing granular media manipulation methods.

Gaussian Splatting Visual MPC for Granular Media Manipulation

TL;DR

This work introduces Gaussian splatting as a latent representation for granular material manipulation and pairs it with a learned visual dynamics model implemented as a graph neural network. By performing gradient-based model-predictive control over the Gaussian-splat state and planning with a density-field objective, the approach achieves efficient, precise manipulation of granular piles in both simulation and real-world settings. The method demonstrates strong reconstruction and short-horizon predictive accuracy, outperforms baselines on manipulation tasks, and shows zero-shot transfer to new environments with differing particle shapes. Overall, the paper highlights Gaussian splatting as a high-signal, physics-informed representation that enhances planning and control for complex, non-rigid materials.

Abstract

Recent advancements in learned 3D representations have enabled significant progress in solving complex robotic manipulation tasks, particularly for rigid-body objects. However, manipulating granular materials such as beans, nuts, and rice, remains challenging due to the intricate physics of particle interactions, high-dimensional and partially observable state, inability to visually track individual particles in a pile, and the computational demands of accurate dynamics prediction. Current deep latent dynamics models often struggle to generalize in granular material manipulation due to a lack of inductive biases. In this work, we propose a novel approach that learns a visual dynamics model over Gaussian splatting representations of scenes and leverages this model for manipulating granular media via Model-Predictive Control. Our method enables efficient optimization for complex manipulation tasks on piles of granular media. We evaluate our approach in both simulated and real-world settings, demonstrating its ability to solve unseen planning tasks and generalize to new environments in a zero-shot transfer. We also show significant prediction and manipulation performance improvements compared to existing granular media manipulation methods.

Paper Structure

This paper contains 14 sections, 16 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: Our method takes a few multi-view images of a scene and their corresponding camera poses as input, and (a) converts them into their Gaussian splatting representation, (b) learns a dynamics model $f$ over these representations, and (c) performs visual model-predictive control for granular material manipulation, which requires view synthesis and dynamics rollouts.
  • Figure 2: Our framework. (a) Given demonstration trajectories with multi-view observations, we leverage Gaussian splatting representations to reconstruct the observed images at each timestep. (b) The dynamics model $f$ predicts the temporal evolution of the Gaussian Splatting representation $Z_t$ with input action ${\cvec{u}_t}$. In the planning stage, we optimize the action sequence ${\cvec{u}_t}$ by minimizing the task objective $c(\cvec{Z}_T, \cvec{Z}_{target})$.
  • Figure 3: Real-world experiment setup. (a) The robotic manipulator, with a pusher attached to the end-effector, moves granular materials within the workspace. Four calibrated RGBD cameras are mounted around the workspace to provide multi-view observations. (b) The granular materials used in real-world experiments include coffee beans, peanuts, pistachios, and almonds.
  • Figure 4: Dynamics model rollouts. We show the rollout predictions of the dynamics model in both simulation (left) and real-world data (right). Both of the rollout results show that the dynamics model prediction is accurate for a few steps.
  • Figure 5: Qualitative results from real-world experiments. (a) Evaluation of our method on a collection task with different objects than what it was trained on. The objects vary in scale and physical properties (e.g., almonds and pistachios remain quasi-static during MPC steps, while peanuts and coffee beans may roll after being pushed). (b) Pushing object piles into two separate target configurations. Our method successfully pushes randomly scattered objects into the desired locations.
  • ...and 3 more figures