ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
Taewoong Kim, Cheolhong Min, Byeonghwi Kim, Jinyeon Kim, Wonje Jeung, Jonghyun Choi
TL;DR
ReALFRED addresses the gap between synthetic embodied AI benchmarks and real-world deployment by providing 3D-captured, object-interactable, multi-room environments with free-form language instructions. The authors create a large-scale dataset (150 houses, 114 object types, 30,696 directives) and expert demonstrations via a PDDL-based planner, and evaluate multiple baselines including sim-to-real and real-to-real transfer with GAN-domain adaptation. Results show that state-of-the-art methods struggle in ReALFRED's realism and scale, motivating new approaches and highlighting the importance of real-world-like data for robust instruction following. The benchmark and public data/code aim to accelerate progress toward deployable, language-driven robotic agents.
Abstract
Simulated virtual environments have been widely used to learn robotic agents that perform daily household tasks. These environments encourage research progress by far, but often provide limited object interactability, visual appearance different from real-world environments, or relatively smaller environment sizes. This prevents the learned models in the virtual scenes from being readily deployable. To bridge the gap between these learning environments and deploying (i.e., real) environments, we propose the ReALFRED benchmark that employs real-world scenes, objects, and room layouts to learn agents to complete household tasks by understanding free-form language instructions and interacting with objects in large, multi-room and 3D-captured scenes. Specifically, we extend the ALFRED benchmark with updates for larger environmental spaces with smaller visual domain gaps. With ReALFRED, we analyze previously crafted methods for the ALFRED benchmark and observe that they consistently yield lower performance in all metrics, encouraging the community to develop methods in more realistic environments. Our code and data are publicly available.
