Good Grasps Only: A data engine for self-supervised fine-tuning of pose estimation using grasp poses for verification
Frederik Hagelskjær
TL;DR
This work introduces a data engine for online, self-supervised fine-tuning of robot pose estimation in bin-picking. It fuses zero-shot pose estimation (KeyMatchNet) with in-hand pose verification to automatically generate labeled data during task execution, enabling continuous improvement without a separate training phase. Experiments on four cylindrical objects show that the self-supervised loop outperforms a CAD-trained baseline and generalizes to unseen objects, while maintaining robust grasping and enabling improved insertion. The approach reduces setup time and offers a practical path toward adaptable, self-tuning robotic manipulation in flexible manufacturing settings.
Abstract
In this paper, we present a novel method for self-supervised fine-tuning of pose estimation. Leveraging zero-shot pose estimation, our approach enables the robot to automatically obtain training data without manual labeling. After pose estimation the object is grasped, and in-hand pose estimation is used for data validation. Our pipeline allows the system to fine-tune while the process is running, removing the need for a learning phase. The motivation behind our work lies in the need for rapid setup of pose estimation solutions. Specifically, we address the challenging task of bin picking, which plays a pivotal role in flexible robotic setups. Our method is implemented on a robotics work-cell, and tested with four different objects. For all objects, our method increases the performance and outperforms a state-of-the-art method trained on the CAD model of the objects. Project page available at gogoengine.github.io
