GET-USE: Learning Generalized Tool Usage for Bimanual Mobile Manipulation via Simulated Embodiment Extensions
Bohan Wu, Paul de La Sayette, Li Fei-Fei, Roberto Martín-Martín
TL;DR
GeT-USE addresses the problem of generalized tool usage for bimanual mobile manipulators by learning embodiment extensions in simulation to identify effective tool geometries, then distilling this knowledge into vision-based modules for real-world use. It introduces a two-step process: first train a tool-building policy π_gtb to extend the robot's end-effectors, then train a generalized tool selector D_gts, grasping policy π_gtg, and manipulation policy π_gtm to perform tool usage from depth images, enabling zero-shot sim-to-real transfer. The approach outperforms state-of-the-art crowd-sourced and procedurally generated tool baselines by 30-60% on three tasks (Sweeping, Hook_and_grasp, Decanting) on a 22-DOF TIAGo robot with 6-DOF end-effector control. This work significantly broadens robotics' capability to flexibly choose and use varied tools in unstructured environments, reducing reliance on handcrafted tools and curated datasets.
Abstract
The ability to use random objects as tools in a generalizable manner is a missing piece in robots' intelligence today to boost their versatility and problem-solving capabilities. State-of-the-art robotic tool usage methods focused on procedurally generating or crowd-sourcing datasets of tools for a task to learn how to grasp and manipulate them for that task. However, these methods assume that only one object is provided and that it is possible, with the correct grasp, to perform the task; they are not capable of identifying, grasping, and using the best object for a task when many are available, especially when the optimal tool is absent. In this work, we propose GeT-USE, a two-step procedure that learns to perform real-robot generalized tool usage by learning first to extend the robot's embodiment in simulation and then transferring the learned strategies to real-robot visuomotor policies. Our key insight is that by exploring a robot's embodiment extensions (i.e., building new end-effectors) in simulation, the robot can identify the general tool geometries most beneficial for a task. This learned geometric knowledge can then be distilled to perform generalized tool usage tasks by selecting and using the best available real-world object as tool. On a real robot with 22 degrees of freedom (DOFs), GeT-USE outperforms state-of-the-art methods by 30-60% success rates across three vision-based bimanual mobile manipulation tool-usage tasks.
