Table of Contents
Fetching ...

The One RING: a Robotic Indoor Navigation Generalist

Ainaz Eftekhar, Rose Hendrix, Luca Weihs, Jiafei Duan, Ege Caglar, Jordi Salvador, Alvaro Herrasti, Winson Han, Eli VanderBil, Aniruddha Kembhavi, Ali Farhadi, Ranjay Krishna, Kiana Ehsani, Kuo-Hao Zeng

TL;DR

This paper introduces RING (Robotic Indoor Navigation Generalist), an embodiment-agnostic policy that turns any mobile robot into an effective indoor semantic navigator, and leverages large-scale randomization over robot embodiments to enable robust generalization to many real-world platforms.

Abstract

Modern robots vary significantly in shape, size, and sensor configurations used to perceive and interact with their environments. However, most navigation policies are embodiment-specific--a policy trained on one robot typically fails to generalize to another, even with minor changes in body size or camera viewpoint. As custom hardware becomes increasingly common, there is a growing need for a single policy that generalizes across embodiments, eliminating the need to retrain for each specific robot. In this paper, we introduce RING (Robotic Indoor Navigation Generalist), an embodiment-agnostic policy that turns any mobile robot into an effective indoor semantic navigator. Trained entirely in simulation, RING leverages large-scale randomization over robot embodiments to enable robust generalization to many real-world platforms. To support this, we augment the AI2-THOR simulator to instantiate robots with controllable configurations, varying in body size, rotation pivot point, and camera parameters. On the visual object-goal navigation task, RING achieves strong cross-embodiment (XE) generalization--72.1% average success rate across five simulated embodiments (a 16.7% absolute improvement on the Chores-S benchmark) and 78.9% across four real-world platforms, including Stretch RE-1, LoCoBot, and Unitree Go1--matching or even surpassing embodiment-specific policies. We further deploy RING on the RB-Y1 wheeled humanoid in a real-world kitchen environment, showcasing its out-of-the-box potential for mobile manipulation platforms. (Project website: https://one-ring-policy.allen.ai)

The One RING: a Robotic Indoor Navigation Generalist

TL;DR

This paper introduces RING (Robotic Indoor Navigation Generalist), an embodiment-agnostic policy that turns any mobile robot into an effective indoor semantic navigator, and leverages large-scale randomization over robot embodiments to enable robust generalization to many real-world platforms.

Abstract

Modern robots vary significantly in shape, size, and sensor configurations used to perceive and interact with their environments. However, most navigation policies are embodiment-specific--a policy trained on one robot typically fails to generalize to another, even with minor changes in body size or camera viewpoint. As custom hardware becomes increasingly common, there is a growing need for a single policy that generalizes across embodiments, eliminating the need to retrain for each specific robot. In this paper, we introduce RING (Robotic Indoor Navigation Generalist), an embodiment-agnostic policy that turns any mobile robot into an effective indoor semantic navigator. Trained entirely in simulation, RING leverages large-scale randomization over robot embodiments to enable robust generalization to many real-world platforms. To support this, we augment the AI2-THOR simulator to instantiate robots with controllable configurations, varying in body size, rotation pivot point, and camera parameters. On the visual object-goal navigation task, RING achieves strong cross-embodiment (XE) generalization--72.1% average success rate across five simulated embodiments (a 16.7% absolute improvement on the Chores-S benchmark) and 78.9% across four real-world platforms, including Stretch RE-1, LoCoBot, and Unitree Go1--matching or even surpassing embodiment-specific policies. We further deploy RING on the RB-Y1 wheeled humanoid in a real-world kitchen environment, showcasing its out-of-the-box potential for mobile manipulation platforms. (Project website: https://one-ring-policy.allen.ai)

Paper Structure

This paper contains 25 sections, 1 equation, 14 figures, 15 tables.

Figures (14)

  • Figure 1: We show that training on one million randomly generated embodiments in simulation (varying camera configurations, body size, and rotation pivot point) results in Ring, a generalist navigation policy that works across various robot embodiments in the real world. (A) A t-SNE visualization of embodiment parameters for 30k random agents and three real robots (we do not train on any real robot embodiment parameters). Egocentric views from the first camera are shown for 10 sample agents. (B) Ring transfers zero-shot to a wide range of embodiments in the real-world including Stretch RE-1, LoCoBot, Unitree Go1, RB-Y1 wheeled humanoid. (C) The policy displays embodiment-adaptive behavior, adjusting its navigation strategy based on its embodiment.
  • Figure 2: Different embodiments exhibit different behaviors. We show egocentric view from the main camera and third-person view of the 2 agents--white boxes indicate the robot colliders. Embodiment B can go under the table to get to the chair but Embodiment A collides with the table and has to go around.
  • Figure 3: Ring model architecture. It accepts visual observations and a language instruction as inputs and predicts an action to execute. At RL finetuning, Ring also predicts a value estimate.
  • Figure 4: Ring exhibits embodiment-adaptive behavior, adjusting its navigation strategy based on the robot’s physical configuration. The shorter quadruped robot (B) walks under the bed, while the taller Stretch-RE1 (A) navigates around it. In (C), an agent with the same height as Stretch-RE1 but a lower camera position initially attempts to go under the bed, mistakenly assuming it can fit. After a collision, it adapts and reroutes around the bed, similar to Stretch-RE1.
  • Figure 5: Embodiment-Specialized Adaptation.Ring, pretrained on randomized embodiments, adapts efficiently to individual robots with minimal fine-tuning.
  • ...and 9 more figures