Table of Contents
Fetching ...

RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

Soroush Nasiriany, Sepehr Nasiriany, Abhiram Maddukuri, Yuke Zhu

TL;DR

RoboCasa365, a comprehensive simulation benchmark for household mobile manipulation, is presented, designed to support systematic evaluations for different problem settings, including multi-task learning, robot foundation model training, and lifelong learning.

Abstract

Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remains difficult to gauge how close we are to this vision. The field lacks a reproducible, large-scale benchmark for systematic evaluation. To fill this gap, we present RoboCasa365, a comprehensive simulation benchmark for household mobile manipulation. Built on the RoboCasa platform, RoboCasa365 introduces 365 everyday tasks across 2,500 diverse kitchen environments, with over 600 hours of human demonstration data and over 1600 hours of synthetically generated demonstration data -- making it one of the most diverse and large-scale resources for studying generalist policies. RoboCasa365 is designed to support systematic evaluations for different problem settings, including multi-task learning, robot foundation model training, and lifelong learning. We conduct extensive experiments on this benchmark with state-of-the-art methods and analyze the impacts of task diversity, dataset scale, and environment variation on generalization. Our results provide new insights into what factors most strongly affect the performance of generalist robots and inform strategies for future progress in the field.

RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

TL;DR

RoboCasa365, a comprehensive simulation benchmark for household mobile manipulation, is presented, designed to support systematic evaluations for different problem settings, including multi-task learning, robot foundation model training, and lifelong learning.

Abstract

Recent advances in robot learning have accelerated progress toward generalist robots that can perform everyday tasks in human environments. Yet it remains difficult to gauge how close we are to this vision. The field lacks a reproducible, large-scale benchmark for systematic evaluation. To fill this gap, we present RoboCasa365, a comprehensive simulation benchmark for household mobile manipulation. Built on the RoboCasa platform, RoboCasa365 introduces 365 everyday tasks across 2,500 diverse kitchen environments, with over 600 hours of human demonstration data and over 1600 hours of synthetically generated demonstration data -- making it one of the most diverse and large-scale resources for studying generalist policies. RoboCasa365 is designed to support systematic evaluations for different problem settings, including multi-task learning, robot foundation model training, and lifelong learning. We conduct extensive experiments on this benchmark with state-of-the-art methods and analyze the impacts of task diversity, dataset scale, and environment variation on generalization. Our results provide new insights into what factors most strongly affect the performance of generalist robots and inform strategies for future progress in the field.
Paper Structure (39 sections, 10 figures, 11 tables)

This paper contains 39 sections, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Overview of RoboCasa365. RoboCasa365 is a large-scale simulation framework for training and benchmarking generalist robots. RoboCasa365 includes 365 everyday tasks, 2500 diverse kitchen scenes, over 600 hours of human demonstration data, plus 1600 hours of synthetically generated demonstration data, and systematic benchmarks for training and evaluating generalist robot models.
  • Figure 2: Kitchen Scenes. Our simulation framework features 2,500 distinct kitchen scenes for pretraining (top, representative samples shown), and 10 distinct target kitchen scenes (bottom, all scenes shown).
  • Figure 3: Composite Tasks. RoboCasa365 features 300 composite tasks that involve a sequence of skills. We use large language models to generate a set of high-level activities, and for each activity, a set of task blueprints. There are 6 activity families (high-level categories) spanning 60 activities, which organize composite tasks based on shared functional and semantic structure. Representative tasks are shown for selected activities.
  • Figure 4: Distribution of task lengths (by number of subtasks) and dataset episode lengths (by number of seconds). We observe a long tail of tasks and data representing long-horizon behaviors.
  • Figure 5: Foundation Model Training Results. Pre-training enables more effective learning of downstream tasks with significant gains in data efficiency.
  • ...and 5 more figures