RoboBenchMart: Benchmarking Robots in Retail Environment
Konstantin Soshin, Alexander Krapukhin, Andrei Spiridonov, Denis Shepelev, Gregorii Bukhtuev, Andrey Kuznetsov, Vlad Shakhuro
TL;DR
RoboBenchMart targets the gap between existing tabletop benchmarks and real-world retail manipulation by providing a dark-store, cluttered-shelf simulation with procedural store layouts and end-to-end trajectory generation. It introduces a Store Plan Generator and a Store Trajectories Sampler to synthesize diverse scenes and demonstrations, and benchmarks generalist vision-language-action policies on atomic and composite retail tasks using the Fetch robot. The results reveal a clear performance gap for current generalist methods in retail, highlighting the need for retail-specific pretraining, task-aware policies, and broader evaluation scenarios. The open-source suite, including protocols, data, and baselines, is designed to accelerate robust, scalable robotic automation in near-term retail applications.
Abstract
Most existing robotic manipulation benchmarks focus on simplified tabletop scenarios, typically involving a stationary robotic arm interacting with various objects on a flat surface. To address this limitation, we introduce RoboBenchMart, a more challenging and realistic benchmark designed for dark store environments, where robots must perform complex manipulation tasks with diverse grocery items. This setting presents significant challenges, including dense object clutter and varied spatial configurations -- with items positioned at different heights, depths, and in close proximity. By targeting the retail domain, our benchmark addresses a setting with strong potential for near-term automation impact. We demonstrate that current state-of-the-art generalist models struggle to solve even common retail tasks. To support further research, we release the RoboBenchMart suite, which includes a procedural store layout generator, a trajectory generation pipeline, evaluation tools and fine-tuned baseline models.
