Table of Contents
Fetching ...

Optimizing Robotic Placement via Grasp-Dependent Feasibility Prediction

Tianyuan Liu, Richard Dazeley, Benjamin Champion, Akan Cosgun

TL;DR

The paper tackles efficient robotic pick-and-place by learning to score precomputed grasp–place candidates using cheap, physics-free labels. It introduces two signals—path-wise IK feasibility and transit collision risk—learned by a compact dual-output MLP to rank candidates before planning. A rank-and-plan policy using an IK gate outperforms a baseline in physics-enabled execution under a fixed computational budget, delivering earlier successes and fewer planner calls. The approach is demonstrated on a single rigid cuboid with side-face grasps, with discussion of extensions to diversify objects and more complex waypoint schemes.

Abstract

In this paper, we study whether inexpensive, physics-free supervision can reliably prioritize grasp-place candidates for budget-aware pick-and-place. From an object's initial pose, target pose, and a candidate grasp, we generate two path-aware geometric labels: path-wise inverse kinematics (IK) feasibility across a fixed approach-grasp-lift waypoint template, and a transit collision flag from mesh sweeps along the same template. A compact dual-output MLP learns these signals from pose encodings, and at test time its scores rank precomputed candidates for a rank-then-plan policy under the same IK gate and planner as the baseline. Although learned from cheap labels only, the scores transfer to physics-enabled executed trajectories: at a fixed planning budget the policy finds successful paths sooner with fewer planner calls while keeping final success on par or better. This work targets a single rigid cuboid with side-face grasps and a fixed waypoint template, and we outline extensions to varied objects and richer waypoint schemes.

Optimizing Robotic Placement via Grasp-Dependent Feasibility Prediction

TL;DR

The paper tackles efficient robotic pick-and-place by learning to score precomputed grasp–place candidates using cheap, physics-free labels. It introduces two signals—path-wise IK feasibility and transit collision risk—learned by a compact dual-output MLP to rank candidates before planning. A rank-and-plan policy using an IK gate outperforms a baseline in physics-enabled execution under a fixed computational budget, delivering earlier successes and fewer planner calls. The approach is demonstrated on a single rigid cuboid with side-face grasps, with discussion of extensions to diversify objects and more complex waypoint schemes.

Abstract

In this paper, we study whether inexpensive, physics-free supervision can reliably prioritize grasp-place candidates for budget-aware pick-and-place. From an object's initial pose, target pose, and a candidate grasp, we generate two path-aware geometric labels: path-wise inverse kinematics (IK) feasibility across a fixed approach-grasp-lift waypoint template, and a transit collision flag from mesh sweeps along the same template. A compact dual-output MLP learns these signals from pose encodings, and at test time its scores rank precomputed candidates for a rank-then-plan policy under the same IK gate and planner as the baseline. Although learned from cheap labels only, the scores transfer to physics-enabled executed trajectories: at a fixed planning budget the policy finds successful paths sooner with fewer planner calls while keeping final success on par or better. This work targets a single rigid cuboid with side-face grasps and a fixed waypoint template, and we outline extensions to varied objects and richer waypoint schemes.

Paper Structure

This paper contains 23 sections, 11 equations, 4 figures, 2 tables, 2 algorithms.

Figures (4)

  • Figure 1: (A) Training flow, no physics. Sample object and grasp poses, build the canonical P–G–L path, compute path-wise IK and path-sweep collision labels, and train a dual-output MLP. (B) Execution in Sim, physics enabled. Score held-out labels and run 10k executed-sim trajectories by planning with RRT-Connect and executing in PhysX to compare predictions to outcomes. (C) A dual-output MLP scores the triplet $(^{W}\!T_{O_i},\,^{O_i}\!T_G,\,^{W}\!T_{O_f})$. Inputs are three pose blocks: grasp in the initial object frame $\phi_G=[t_G,\,r(R_G)]$, and object poses at pick/place in world $\phi_i=[t_i,\,r(R_i)]$, $\phi_f=[t_f,\, r(R_f)]$,$r(\cdot)$ is a 6D rotation (first two matrix columns). There are two optional descriptors for ablation studies: final-pose corners' positions ($8{\times}3$) in world frame and a grasp meta vector $\phi_m=[u_{\mathrm{frac}},v_{\mathrm{frac}},onehot(face)]$ ($u_{frac}, v_{frac} \in [0,1]$, they are fractions along the face that will be grasped and $onehot(face)$ indicates which face is grasped). The available blocks are concatenated into a fused feature embedding $z$ and processed by a trunk MLP (128–128–64) with two heads predicting IK and collision scores. (D) An example scene of actual trajectory. 7-DoF arm moving a single cuboid between pedestals using the canonical pre-grasp (P), grasp (G), lift (L) template. The grey box represents the actual object, the green box is the target pose for visualization only.
  • Figure 2: The setup shown in the picture is how pedestals are placed, and the robot is placed at the origin
  • Figure 3: Comparison of object orientations. Left: a stable pose in which the object can rest on a surface (feasible placement). Right: an unstable orientation that the object cannot maintain (infeasible pose).
  • Figure 4: Grasp parameterization (single composite view). Left: in-plane lattice on a side face — $u$ runs along the longer in-plane edge and $v$ along the shorter; black dots are candidate contacts and the hollow dot marks $(u{=}0.5, v{=}0.5)$. Right: the corresponding grasp on the object — red: gripper-closing axis, green: binormal axis, blue: approach axis; the face normal $n$ points opposite the local $z$ shown (i.e., $n = -\hat{z}$ for that face)