Embodied Perception for Test-time Grasping Detection Adaptation with Knowledge Infusion

Jin Liu; Jialong Xie; Leibing Xiao; Chaoqun Wang; Fengyu Zhou

Embodied Perception for Test-time Grasping Detection Adaptation with Knowledge Infusion

Jin Liu, Jialong Xie, Leibing Xiao, Chaoqun Wang, Fengyu Zhou

TL;DR

The paper tackles the generalization gap in robotic grasp detection for unseen environments by proposing an embodied test-time adaptation framework that leverages active exploration and a knowledge infusion mechanism. It decomposes the system into a Grasping Knowledge Retrieval Module, an Embodied Perception Module with pre-distributed viewpoints, and a Network Optimization Module to adapt both the grasp detector and the knowledge base online without annotations. Key contributions include a knowledge-pool-guided initialization of viewpoints, a rigorous embodied parameter–based quality assessment to preserve high-quality samples, and a joint optimization objective that combines $\\mathcal{L}_{act}$ and $\\mathcal{L}_{know}$ for continuous online learning. Real-robot experiments show significant improvements in cross-domain and same-domain grasping accuracy over strong baselines, demonstrating practical impact for autonomous, label-free adaptation in dynamic environments.

Abstract

It has always been expected that a robot can be easily deployed to unknown scenarios, accomplishing robotic grasping tasks without human intervention. Nevertheless, existing grasp detection approaches are typically off-body techniques and are realized by training various deep neural networks with extensive annotated data support. {In this paper, we propose an embodied test-time adaptation framework for grasp detection that exploits the robot's exploratory capabilities.} The framework aims to improve the generalization performance of grasping skills for robots in an unforeseen environment. Specifically, we introduce embodied assessment criteria based on the robot's manipulation capability to evaluate the quality of the grasp detection and maintain suitable samples. This process empowers the robots to actively explore the environment and continuously learn grasping skills, eliminating human intervention. Besides, to improve the efficiency of robot exploration, we construct a flexible knowledge base to provide context of initial optimal viewpoints. Conditioned on the maintained samples, the grasp detection networks can be adapted in the test-time scene. When the robot confronts new objects, it will undergo the same adaptation procedure mentioned above to realize continuous learning. Extensive experiments conducted on a real-world robot demonstrate the effectiveness and generalization of our proposed framework.

Embodied Perception for Test-time Grasping Detection Adaptation with Knowledge Infusion

TL;DR

Abstract

Embodied Perception for Test-time Grasping Detection Adaptation with Knowledge Infusion

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)