ActiveGrasp: Information-Guided Active Grasping with Calibrated Energy-based Model
Boshu Lei, Wen Jiang, Kostas Daniilidis
TL;DR
ActiveGrasp tackles cluttered robotic grasping by formulating NBV selection as maximizing information gain derived from the grasp pose distribution on the $SE(3)$ manifold. It introduces a calibrated energy-based model (EBM) that captures the multi-modal grasp distribution and aligns the energy with actual grasp success through a learnable temperature and tailored losses, enabling reliable entropy-based planning. The approach combines Gaussian Posterior approximations (GAP), denoised score matching on $SE(3)$, and a novel calibrated grasp generation pipeline to estimate information gain and guide view selection under limited budgets. Experiments in both simulation and real robot setups show improved grasp success rates and lower calibration error compared to state-of-the-art baselines, with a reproducible benchmark built on physically informed simulators. The work provides a principled, scalable framework for active perception in manipulation, and makes its code and data publicly available for reproducibility.
Abstract
Grasping in a densely cluttered environment is a challenging task for robots. Previous methods tried to solve this problem by actively gathering multiple views before grasp pose generation. However, they either overlooked the importance of the grasp distribution for information gain estimation or relied on the projection of the grasp distribution, which ignores the structure of grasp poses on the SE(3) manifold. To tackle these challenges, we propose a calibrated energy-based model for grasp pose generation and an active view selection method that estimates information gain from grasp distribution. Our energy-based model captures the multi-modality nature of grasp distribution on the SE(3) manifold. The energy level is calibrated to the success rate of grasps so that the predicted distribution aligns with the real distribution. The next best view is selected by estimating the information gain for grasp from the calibrated distribution conditioned on the reconstructed environment, which could efficiently drive the robot to explore affordable parts of the target object. Experiments on simulated environments and real robot setups demonstrate that our model could successfully grasp objects in a cluttered environment with limited view budgets compared to previous state-of-the-art models. Our simulated environment can serve as a reproducible platform for future research on active grasping. The source code of our paper will be made public when the paper is released to the public.
