Table of Contents
Fetching ...

dGrasp: NeRF-Informed Implicit Grasp Policies with Supervised Optimization Slopes

Gergely Sóti, Xi Huang, Christian Wurll, Björn Hein

Abstract

We present dGrasp, an implicit grasp policy with an enhanced optimization landscape. This landscape is defined by a NeRF-informed grasp value function. The neural network representing this function is trained on simulated grasp demonstrations. During training, we use an auxiliary loss to guide not only the weight updates of this network but also the update how the slope of the optimization landscape changes. This loss is computed on the demonstrated grasp trajectory and the gradients of the landscape. With second order optimization, we incorporate valuable information from the trajectory as well as facilitate the optimization process of the implicit policy. Experiments demonstrate that employing this auxiliary loss improves policies' performance in simulation as well as their zero-shot transfer to the real-world.

dGrasp: NeRF-Informed Implicit Grasp Policies with Supervised Optimization Slopes

Abstract

We present dGrasp, an implicit grasp policy with an enhanced optimization landscape. This landscape is defined by a NeRF-informed grasp value function. The neural network representing this function is trained on simulated grasp demonstrations. During training, we use an auxiliary loss to guide not only the weight updates of this network but also the update how the slope of the optimization landscape changes. This loss is computed on the demonstrated grasp trajectory and the gradients of the landscape. With second order optimization, we incorporate valuable information from the trajectory as well as facilitate the optimization process of the implicit policy. Experiments demonstrate that employing this auxiliary loss improves policies' performance in simulation as well as their zero-shot transfer to the real-world.
Paper Structure (19 sections, 10 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 19 sections, 10 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: Policy Representations - Comparison of policy representations for observation $o$ and a demonstration trajectory $\{a_t, a_{t+1}, ... , a_{t+4}\}$ in a two-dimensional action space. Implicit Behavior Cloning (IBC) learns an energy function $E(a, o)$ using negative sampling;Diffusion Policy learns a noise function $\varepsilon(a, o)$ that approximates the gradient field of the energy function $\nabla_a E(a,o)$;In our approach we learn a value function using negative sampling $\Psi(a, o)$ additionally supervising its gradients $\nabla_a \Psi(a, o)$ using the demonstration trajectories during training.This formulation combines the convenient representation of IBC and the stability and robustnes of diffusion policy.
  • Figure 2: Computational model for the implicit policy's value function $\Psi$ for a 6-DoF grasp candidate and an observation - First, partial pose decomposition (PPD) is applied to the 6-DoF grasp candidate to obtain a set of 5-DoF support poses, and a feature map is computed from the input observation. Then, for each 5-DoF support pose, a NeRF feature vector is computed using the extracted feature map and a pre-trained NeRF. These are finally processed by the value network to obtain the grasp value for the input 6-DoF grasp candidate.
  • Figure 3: Partial pose decomposition - A set of 5-DoF support poses are computed from an initial 6-DoF pose using predefined transformations. The image shows the TCP as a 6-DoF pose and a possible set of its support poses that correspond to the gripper's geometry: the yellow points with purple direction vectors pointing inwards to characterize possible object boundaries that the gripper could grasp.
  • Figure 4: Computation of support pose NeRF features - Left: Computation of NeRF features for a 5-DoF support pose using its corresponding visual feature vector; Right: network architecture of the NeRF and activation aggregation models. The green arrows represent the activations of the NeRF's last four ResNet blocks that are aggregated to form the NeRF feature corresponding to the input support pose.
  • Figure 5: Network architectures - Value Network: The network processes the support pose NeRF features to compute the final grasp value for the implicit policy; ResNet: Used in NeRF and Value Networks. Transforms the residual shortcut if the input and output dimensions are not equal.
  • ...and 4 more figures