RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, Ali Farhadi
TL;DR
RoboTHOR introduces an open, modular platform pairing simulated embodied agents with physical robots to study and benchmark simulation-to-real transfer in indoor visual navigation. The authors define a semantic navigation task, implement multiple baselines, and evaluate sim-to-sim and sim-to-real transfers, revealing a significant performance drop when moving from simulation to reality due to appearance and control dynamics gaps. Key analyses show perceptual and sensor-domain misalignments, camera-parameter sensitivity, and the ineffectiveness of naive image-translation domain adaptation. The work emphasizes RoboTHOR's potential to democratize, reproduce, and accelerate research in embodied AI by offering remote, scalable benchmarking across sim and real environments.
Abstract
Visual recognition ecosystems (e.g. ImageNet, Pascal, COCO) have undeniably played a prevailing role in the evolution of modern computer vision. We argue that interactive and embodied visual AI has reached a stage of development similar to visual recognition prior to the advent of these ecosystems. Recently, various synthetic environments have been introduced to facilitate research in embodied AI. Notwithstanding this progress, the crucial question of how well models trained in simulation generalize to reality has remained largely unanswered. The creation of a comparable ecosystem for simulation-to-real embodied AI presents many challenges: (1) the inherently interactive nature of the problem, (2) the need for tight alignments between real and simulated worlds, (3) the difficulty of replicating physical conditions for repeatable experiments, (4) and the associated cost. In this paper, we introduce RoboTHOR to democratize research in interactive and embodied visual AI. RoboTHOR offers a framework of simulated environments paired with physical counterparts to systematically explore and overcome the challenges of simulation-to-real transfer, and a platform where researchers across the globe can remotely test their embodied models in the physical world. As a first benchmark, our experiments show there exists a significant gap between the performance of models trained in simulation when they are tested in both simulations and their carefully constructed physical analogs. We hope that RoboTHOR will spur the next stage of evolution in embodied computer vision. RoboTHOR can be accessed at the following link: https://ai2thor.allenai.org/robothor
