Table of Contents
Fetching ...

RoboBenchMart: Benchmarking Robots in Retail Environment

Konstantin Soshin, Alexander Krapukhin, Andrei Spiridonov, Denis Shepelev, Gregorii Bukhtuev, Andrey Kuznetsov, Vlad Shakhuro

TL;DR

RoboBenchMart targets the gap between existing tabletop benchmarks and real-world retail manipulation by providing a dark-store, cluttered-shelf simulation with procedural store layouts and end-to-end trajectory generation. It introduces a Store Plan Generator and a Store Trajectories Sampler to synthesize diverse scenes and demonstrations, and benchmarks generalist vision-language-action policies on atomic and composite retail tasks using the Fetch robot. The results reveal a clear performance gap for current generalist methods in retail, highlighting the need for retail-specific pretraining, task-aware policies, and broader evaluation scenarios. The open-source suite, including protocols, data, and baselines, is designed to accelerate robust, scalable robotic automation in near-term retail applications.

Abstract

Most existing robotic manipulation benchmarks focus on simplified tabletop scenarios, typically involving a stationary robotic arm interacting with various objects on a flat surface. To address this limitation, we introduce RoboBenchMart, a more challenging and realistic benchmark designed for dark store environments, where robots must perform complex manipulation tasks with diverse grocery items. This setting presents significant challenges, including dense object clutter and varied spatial configurations -- with items positioned at different heights, depths, and in close proximity. By targeting the retail domain, our benchmark addresses a setting with strong potential for near-term automation impact. We demonstrate that current state-of-the-art generalist models struggle to solve even common retail tasks. To support further research, we release the RoboBenchMart suite, which includes a procedural store layout generator, a trajectory generation pipeline, evaluation tools and fine-tuned baseline models.

RoboBenchMart: Benchmarking Robots in Retail Environment

TL;DR

RoboBenchMart targets the gap between existing tabletop benchmarks and real-world retail manipulation by providing a dark-store, cluttered-shelf simulation with procedural store layouts and end-to-end trajectory generation. It introduces a Store Plan Generator and a Store Trajectories Sampler to synthesize diverse scenes and demonstrations, and benchmarks generalist vision-language-action policies on atomic and composite retail tasks using the Fetch robot. The results reveal a clear performance gap for current generalist methods in retail, highlighting the need for retail-specific pretraining, task-aware policies, and broader evaluation scenarios. The open-source suite, including protocols, data, and baselines, is designed to accelerate robust, scalable robotic automation in near-term retail applications.

Abstract

Most existing robotic manipulation benchmarks focus on simplified tabletop scenarios, typically involving a stationary robotic arm interacting with various objects on a flat surface. To address this limitation, we introduce RoboBenchMart, a more challenging and realistic benchmark designed for dark store environments, where robots must perform complex manipulation tasks with diverse grocery items. This setting presents significant challenges, including dense object clutter and varied spatial configurations -- with items positioned at different heights, depths, and in close proximity. By targeting the retail domain, our benchmark addresses a setting with strong potential for near-term automation impact. We demonstrate that current state-of-the-art generalist models struggle to solve even common retail tasks. To support further research, we release the RoboBenchMart suite, which includes a procedural store layout generator, a trajectory generation pipeline, evaluation tools and fine-tuned baseline models.

Paper Structure

This paper contains 32 sections, 10 figures, 8 tables.

Figures (10)

  • Figure 1: RoboBenchMart in action --- the Fetch robot operates in a realistic, cluttered retail environment.
  • Figure 2: Examples of generated store with fixtures arranged by our pipeline.
  • Figure 3: Example of product arrangement and shelf depletion over time produced by our simulator.
  • Figure 4: Examples of collected product assets.
  • Figure 5: Examples of ceiling, wall, and floor textures used in our store generation pipeline, illustrating just a subset of possible variations.
  • ...and 5 more figures