Dynamic Grasping with a Learned Meta-Controller
Yinsen Jia, Jingxi Xu, Dinesh Jayaraman, Shuran Song
TL;DR
The paper tackles dynamic grasping of moving objects by introducing a learned meta-controller that adaptively tunes two critical meta-parameters, the look-ahead time $T_L$ for pose prediction and the motion-planner time budget $T_T$, on a per-iteration basis. It presents a dynamic grasping pipeline where a PPO-trained meta-controller uses a history of scene information to output continuous $T_L$ and $T_T$, coordinating among the object pose predictor, grasp planner, and arm motion planner. Empirical results in simulation show up to 28% improvement in grasping success and reduced grasping time in cluttered environments, with strong generalization to more obstacles and unseen obstacle shapes. The approach is poised to generalize beyond grasping to other robotics tasks that couple multiple subsystems with interdependent meta-parameters, enabling more robust and efficient real-time decision-making in dynamic scenes.
Abstract
Grasping moving objects is a challenging task that requires multiple submodules such as object pose predictor, arm motion planner, etc. Each submodule operates under its own set of meta-parameters. For example, how far the pose predictor should look into the future (i.e., look-ahead time) and the maximum amount of time the motion planner can spend planning a motion (i.e., time budget). Many previous works assign fixed values to these parameters; however, at different moments within a single episode of dynamic grasping, the optimal values should vary depending on the current scene. In this work, we propose a dynamic grasping pipeline with a meta-controller that controls the look-ahead time and time budget dynamically. We learn the meta-controller through reinforcement learning with a sparse reward. Our experiments show the meta-controller improves the grasping success rate (up to 28% in the most cluttered environment) and reduces grasping time, compared to the strongest baseline. Our meta-controller learns to reason about the reachable workspace and maintain the predicted pose within the reachable region. In addition, it assigns a small but sufficient time budget for the motion planner. Our method can handle different objects, trajectories, and obstacles. Despite being trained only with 3-6 random cuboidal obstacles, our meta-controller generalizes well to 7-9 obstacles and more realistic out-of-domain household setups with unseen obstacle shapes.
