Dynamic Grasping with a Learned Meta-Controller

Yinsen Jia; Jingxi Xu; Dinesh Jayaraman; Shuran Song

Dynamic Grasping with a Learned Meta-Controller

Yinsen Jia, Jingxi Xu, Dinesh Jayaraman, Shuran Song

TL;DR

The paper tackles dynamic grasping of moving objects by introducing a learned meta-controller that adaptively tunes two critical meta-parameters, the look-ahead time $T_L$ for pose prediction and the motion-planner time budget $T_T$, on a per-iteration basis. It presents a dynamic grasping pipeline where a PPO-trained meta-controller uses a history of scene information to output continuous $T_L$ and $T_T$, coordinating among the object pose predictor, grasp planner, and arm motion planner. Empirical results in simulation show up to 28% improvement in grasping success and reduced grasping time in cluttered environments, with strong generalization to more obstacles and unseen obstacle shapes. The approach is poised to generalize beyond grasping to other robotics tasks that couple multiple subsystems with interdependent meta-parameters, enabling more robust and efficient real-time decision-making in dynamic scenes.

Abstract

Grasping moving objects is a challenging task that requires multiple submodules such as object pose predictor, arm motion planner, etc. Each submodule operates under its own set of meta-parameters. For example, how far the pose predictor should look into the future (i.e., look-ahead time) and the maximum amount of time the motion planner can spend planning a motion (i.e., time budget). Many previous works assign fixed values to these parameters; however, at different moments within a single episode of dynamic grasping, the optimal values should vary depending on the current scene. In this work, we propose a dynamic grasping pipeline with a meta-controller that controls the look-ahead time and time budget dynamically. We learn the meta-controller through reinforcement learning with a sparse reward. Our experiments show the meta-controller improves the grasping success rate (up to 28% in the most cluttered environment) and reduces grasping time, compared to the strongest baseline. Our meta-controller learns to reason about the reachable workspace and maintain the predicted pose within the reachable region. In addition, it assigns a small but sufficient time budget for the motion planner. Our method can handle different objects, trajectories, and obstacles. Despite being trained only with 3-6 random cuboidal obstacles, our meta-controller generalizes well to 7-9 obstacles and more realistic out-of-domain household setups with unseen obstacle shapes.

Dynamic Grasping with a Learned Meta-Controller

TL;DR

The paper tackles dynamic grasping of moving objects by introducing a learned meta-controller that adaptively tunes two critical meta-parameters, the look-ahead time

for pose prediction and the motion-planner time budget

, on a per-iteration basis. It presents a dynamic grasping pipeline where a PPO-trained meta-controller uses a history of scene information to output continuous

and

, coordinating among the object pose predictor, grasp planner, and arm motion planner. Empirical results in simulation show up to 28% improvement in grasping success and reduced grasping time in cluttered environments, with strong generalization to more obstacles and unseen obstacle shapes. The approach is poised to generalize beyond grasping to other robotics tasks that couple multiple subsystems with interdependent meta-parameters, enabling more robust and efficient real-time decision-making in dynamic scenes.

Abstract

Paper Structure (18 sections, 1 equation, 9 figures, 2 tables)

This paper contains 18 sections, 1 equation, 9 figures, 2 tables.

Introduction
Related Work
Dynamic Grasping
Delay-accuracy Trade-off
Method
Dynamic Grasping Pipeline
Learning a Meta-controller
State and Action Space
Training
Experiments
Experimental Setups
Baselines
Performance Analysis and Discussion
Conclusion
Appendix
...and 3 more sections

Figures (9)

Figure 1: Meta-parameters in dynamic grasping. The green dots indicate the predicted trajectory for the whole predictable range (8s). The red periodic rectangular line indicates the object's trajectory. The semi-transparent objects indicate the predicted poses at different look-ahead times. If the robot uses a large look-ahead time (5s), the predicted pose will fall into a highly reachable space where the motion planner can plan a motion fast (0.5s). If the robot uses a small look-ahead time (1s), the predicted pose will locate in a highly cluttered area where a collision-free path takes longer to plan (8s). In this example, our meta-controller uses a large look-ahead time and a small time budget. The robot can immediately plan a collision-free motion and move.
Figure 2: Dynamic grasping with a learned meta-controller. We stack a sequence of scene information from the past 5 iterations as the state input to our meta-controller. Our meta-controller generates for the current iteration a look-ahead time $T_L$ for the object pose predictor, and a time budget $T_T$ for the motion planner. We model the meta-controller as a PPO agent trained with RL and a sparse reward.
Figure 3: Meta-controller training plot. Our meta-controller is trained up to 150,000 episodes (around 24 hours) with 5 random seeds. We visualize the mean of the running success rate over the past 1000 episodes and 1 standard deviation is shaded.
Figure 4: Experimental setups. We show two randomly selected examples for each of our 4 setups. In each example, the target object is on the conveyor at the start of the trajectory. The robot arm is in its random initial configuration.
Figure 5: Meta-controller demonstration on the Cluttered Household setup. (a) Scenes at different iterations within the same episode. Green dots indicate the predicted trajectory for the whole predictable range. The semi-transparent object indicates the predicted pose at the look-ahead time. (b) Trajectory reachability visualization. The whole trajectory and its offline-computed reachability are projected vertically onto the $y$-axis of the plot. (c) The assigned time budget by our meta-controller for each iteration. (d) The assigned look-ahead time by our meta-controller for each iteration.
...and 4 more figures

Dynamic Grasping with a Learned Meta-Controller

TL;DR

Abstract

Dynamic Grasping with a Learned Meta-Controller

Authors

TL;DR

Abstract

Table of Contents

Figures (9)