Table of Contents
Fetching ...

A Planar-Symmetric SO(3) Representation for Learning Grasp Detection

Tianyi Ko, Takuya Ikeda, Hiroya Sato, Koichi Nishiwaki

TL;DR

This work tackles the bi-modal rotation ambiguity introduced by planar-symmetric grippers by introducing a planar-symmetric SO(3) representation based on a 2D Bingham distribution. The authors encode two symmetric gripper poses with a single 9-parameter set, train a grasp detector with a joint loss combining cosine similarity and BNLL, and validate via eigen-decomposition or sampling at inference time. Experiments demonstrate improved rotation continuity, informative uncertainty, and higher grasp success and clarity in both simulation and real-robot setup, especially for yaw-critical, large-flat-object scenarios. The approach serves as a practical add-on to direct rotation regression detectors, offering improved yaw robustness without substantial computational overhead.

Abstract

Planar-symmetric hands, such as parallel grippers, are widely adopted in both research and industrial fields. Their symmetry, however, introduces ambiguity and discontinuity in the SO(3) representation, which hinders both the training and inference of neural-network-based grasp detectors. We propose a novel SO(3) representation that can parametrize a pair of planar-symmetric poses with a single parameter set by leveraging the 2D Bingham distribution. We also detail a grasp detector based on our representation, which provides a more consistent rotation output. An intensive evaluation with multiple grippers and objects in both the simulation and the real world quantitatively shows our approach's contribution.

A Planar-Symmetric SO(3) Representation for Learning Grasp Detection

TL;DR

This work tackles the bi-modal rotation ambiguity introduced by planar-symmetric grippers by introducing a planar-symmetric SO(3) representation based on a 2D Bingham distribution. The authors encode two symmetric gripper poses with a single 9-parameter set, train a grasp detector with a joint loss combining cosine similarity and BNLL, and validate via eigen-decomposition or sampling at inference time. Experiments demonstrate improved rotation continuity, informative uncertainty, and higher grasp success and clarity in both simulation and real-robot setup, especially for yaw-critical, large-flat-object scenarios. The approach serves as a practical add-on to direct rotation regression detectors, offering improved yaw robustness without substantial computational overhead.

Abstract

Planar-symmetric hands, such as parallel grippers, are widely adopted in both research and industrial fields. Their symmetry, however, introduces ambiguity and discontinuity in the SO(3) representation, which hinders both the training and inference of neural-network-based grasp detectors. We propose a novel SO(3) representation that can parametrize a pair of planar-symmetric poses with a single parameter set by leveraging the 2D Bingham distribution. We also detail a grasp detector based on our representation, which provides a more consistent rotation output. An intensive evaluation with multiple grippers and objects in both the simulation and the real world quantitatively shows our approach's contribution.
Paper Structure (20 sections, 18 equations, 10 figures, 2 tables)

This paper contains 20 sections, 18 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: (a) For symmetric grippers, two distinct rotations (180$^\circ$-flipped around the approach direction) representing the same grasp cause inconsistency and ambiguity. (b) We propose a novel planar-symmetric $\mathop{\mathrm{SO}}(3)$ representation that can express a pair of poses with a single parameter set. (c-e) It also provides deviation information, which is beneficial in the inference time.
  • Figure 2: Architecture of our grasp detection network with the hand symmetric plane's normal vector $\bm{e}_z$ expressed in the 2D Bignham representation. We can either sample $\bm{e}_z$ from the Bingham distribution $\mathfrak{B}(A)$ or directly perform eigenvalue decomposition of $A$ and take the eigenvector with the largest eigenvalue as $\bm{e}_z$ while taking the difference of the three eigenvalues as the confidence.
  • Figure 3: Cross-section of the network output grasp score field and rotation field at $z=57$ mm plane when a long box is aligned with the workspace origin. Left: baseline. Right: ours.
  • Figure 4: Same plot as Fig. \ref{['fig:bingham_horizontal']} but a flat cylinder is placed next to the box. This time we sample 30 $\bm{e}_z$ from $\mathfrak{B}(A)$. The uncertainty on the box is small because there are no other possible choices to grasp the long box, which corresponds to the case in Fig. \ref{['fig:bingham_hand']}(c). The distribution on the center of the cylinder is close to a uniform one because any downward grasp on the region is affordable. This corresponds to the case of Fig. \ref{['fig:bingham_hand']} (e).
  • Figure 5: Capture the simulation experiment with two kinds of grippers and objects.
  • ...and 5 more figures