Benchmarking Multi-Object Grasping
Tianze Chen, Ricardo Frumento, Giulia Pagnanelli, Gianmarco Cei, Villa Keth, Shahaddin Gafarov, Jian Gong, Zihe Ye, Marco Baracca, Salvatore D'Avella, Matteo Bianchi, Yu Sun
TL;DR
The paper tackles the lack of standardized evaluation tools for multi-object grasping by introducing three benchmarking protocols—Only-Pick-Once (OPO), Accurate pick-transferring (APT), and Pick-transferring-all (PTA)—applied to both pile and surface object arrangements. It details robot and object setups, defines concrete evaluation metrics (PA, OSR, AR, CGPU), and provides baseline results using three representative hands (Barrett, Robotiq, SoftHand-2) plus human performance for reference. The work demonstrates how OPO captures single-round accuracy, APT scales to sequential transfers, and PTA evaluates full scene clearing, highlighting efficiency and reliability trends as task complexity increases. By offering a standardized framework and detailed baselines, the study enables reproducible comparisons and identifies key research gaps in perception, planning, and dexterous manipulation for multi-object grasping in real-world settings.
Abstract
In this work, we describe a multi-object grasping benchmark to evaluate the grasping and manipulation capabilities of robotic systems in both pile and surface scenarios. The benchmark introduces three robot multi-object grasping benchmarking protocols designed to challenge different aspects of robotic manipulation. These protocols are: 1) the Only-Pick-Once protocol, which assesses the robot's ability to efficiently pick multiple objects in a single attempt; 2) the Accurate pick-trnsferring protocol, which evaluates the robot's capacity to selectively grasp and transport a specific number of objects from a cluttered environment; and 3) the Pick-transferring-all protocol, which challenges the robot to clear an entire scene by sequentially grasping and transferring all available objects. These protocols are intended to be adopted by the broader robotics research community, providing a standardized method to assess and compare robotic systems' performance in multi-object grasping tasks. We establish baselines for these protocols using standard planning and perception algorithms on a Barrett hand, Robotiq parallel jar gripper, and the Pisa/IIT Softhand-2, which is a soft underactuated robotic hand. We discuss the results in relation to human performance in similar tasks we well.
