FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation

Chao Tang; Anxing Xiao; Yuhong Deng; Tianrun Hu; Wenlong Dong; Hanbo Zhang; David Hsu; Hong Zhang

FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation

Chao Tang, Anxing Xiao, Yuhong Deng, Tianrun Hu, Wenlong Dong, Hanbo Zhang, David Hsu, Hong Zhang

TL;DR

This work tackles one-shot imitation learning for tool manipulation by addressing intra-function geometric variation with a function-centric approach. FUNCTO introduces a 3D functional keypoint representation (function point, grasp point, center) and a three-stage pipeline—functional keypoint extraction, function-centric correspondence establishment, and functional keypoint-based action planning—to transfer skills from a single demonstration to novel tools. Extensive real-robot experiments show that FUNCTO outperforms modular OSIL methods and end-to-end BC baselines, demonstrating strong generalization to unseen tools while maintaining task feasibility. By focusing on functional, rather than purely geometric, correspondences and leveraging vision-language prompts for keypoint detection and refinement, the method offers a data-efficient path to robust tool use in robotics with potential for broader applications.

Abstract

Learning tool use from a single human demonstration video offers a highly intuitive and efficient approach to robot teaching. While humans can effortlessly generalize a demonstrated tool manipulation skill to diverse tools that support the same function (e.g., pouring with a mug versus a teapot), current one-shot imitation learning (OSIL) methods struggle to achieve this. A key challenge lies in establishing functional correspondences between demonstration and test tools, considering significant geometric variations among tools with the same function (i.e., intra-function variations). To address this challenge, we propose FUNCTO (Function-Centric OSIL for Tool Manipulation), an OSIL method that establishes function-centric correspondences with a 3D functional keypoint representation, enabling robots to generalize tool manipulation skills from a single human demonstration video to novel tools with the same function despite significant intra-function variations. With this formulation, we factorize FUNCTO into three stages: (1) functional keypoint extraction, (2) function-centric correspondence establishment, and (3) functional keypoint-based action planning. We evaluate FUNCTO against exiting modular OSIL methods and end-to-end behavioral cloning methods through real-robot experiments on diverse tool manipulation tasks. The results demonstrate the superiority of FUNCTO when generalizing to novel tools with intra-function geometric variations. More details are available at https://sites.google.com/view/functo.

FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation

TL;DR

Abstract

FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (20)