Table of Contents
Fetching ...

ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI

Ahmad Elawady, Gunjan Chhablani, Ram Ramrakhya, Karmesh Yadav, Dhruv Batra, Zsolt Kira, Andrew Szot

TL;DR

ReLIC tackles rapid adaptation in embodied AI by enabling in-context reinforcement learning over long histories. It introduces a partial-update scheme and Sink-KV attention to scale transformer-based policies to contexts of up to $64{,}000$ steps, trained with on-policy PPO on self-generated data. Empirically, ReLIC outperforms metareinforcement baselines on ExtObjNav, exhibits emergent few-shot imitation, and shows that both partial updates and Sink-KV are critical for effective in-context learning. This work suggests that large-scale RL training combined with specialized attention mechanisms can enable robust, long-horizon in-context adaptation in embodied agents, with code available at GitHub.

Abstract

Intelligent embodied agents need to quickly adapt to new scenarios by integrating long histories of experience into decision-making. For instance, a robot in an unfamiliar house initially wouldn't know the locations of objects needed for tasks and might perform inefficiently. However, as it gathers more experience, it should learn the layout of its environment and remember where objects are, allowing it to complete new tasks more efficiently. To enable such rapid adaptation to new tasks, we present ReLIC, a new approach for in-context reinforcement learning (RL) for embodied agents. With ReLIC, agents are capable of adapting to new environments using 64,000 steps of in-context experience with full attention while being trained through self-generated experience via RL. We achieve this by proposing a novel policy update scheme for on-policy RL called "partial updates'' as well as a Sink-KV mechanism that enables effective utilization of a long observation history for embodied agents. Our method outperforms a variety of meta-RL baselines in adapting to unseen houses in an embodied multi-object navigation task. In addition, we find that ReLIC is capable of few-shot imitation learning despite never being trained with expert demonstrations. We also provide a comprehensive analysis of ReLIC, highlighting that the combination of large-scale RL training, the proposed partial updates scheme, and the Sink-KV are essential for effective in-context learning. The code for ReLIC and all our experiments is at https://github.com/aielawady/relic

ReLIC: A Recipe for 64k Steps of In-Context Reinforcement Learning for Embodied AI

TL;DR

ReLIC tackles rapid adaptation in embodied AI by enabling in-context reinforcement learning over long histories. It introduces a partial-update scheme and Sink-KV attention to scale transformer-based policies to contexts of up to steps, trained with on-policy PPO on self-generated data. Empirically, ReLIC outperforms metareinforcement baselines on ExtObjNav, exhibits emergent few-shot imitation, and shows that both partial updates and Sink-KV are critical for effective in-context learning. This work suggests that large-scale RL training combined with specialized attention mechanisms can enable robust, long-horizon in-context adaptation in embodied agents, with code available at GitHub.

Abstract

Intelligent embodied agents need to quickly adapt to new scenarios by integrating long histories of experience into decision-making. For instance, a robot in an unfamiliar house initially wouldn't know the locations of objects needed for tasks and might perform inefficiently. However, as it gathers more experience, it should learn the layout of its environment and remember where objects are, allowing it to complete new tasks more efficiently. To enable such rapid adaptation to new tasks, we present ReLIC, a new approach for in-context reinforcement learning (RL) for embodied agents. With ReLIC, agents are capable of adapting to new environments using 64,000 steps of in-context experience with full attention while being trained through self-generated experience via RL. We achieve this by proposing a novel policy update scheme for on-policy RL called "partial updates'' as well as a Sink-KV mechanism that enables effective utilization of a long observation history for embodied agents. Our method outperforms a variety of meta-RL baselines in adapting to unseen houses in an embodied multi-object navigation task. In addition, we find that ReLIC is capable of few-shot imitation learning despite never being trained with expert demonstrations. We also provide a comprehensive analysis of ReLIC, highlighting that the combination of large-scale RL training, the proposed partial updates scheme, and the Sink-KV are essential for effective in-context learning. The code for ReLIC and all our experiments is at https://github.com/aielawady/relic
Paper Structure (31 sections, 15 figures, 2 tables)

This paper contains 31 sections, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Overview of the ReLIC approach and problem setup. ReLIC learns a "pixels-to-actions" policy from reward alone via reinforcement learning capable of in-context adapting to new tasks at test time. The figure shows the trained ReLIC policy finding objects in an unseen house. In earlier episodes, the agent randomly explores to find the small target object since the scene is new. But after 64k steps of visual observations, ReLIC efficiently navigates to new target objects.
  • Figure 2: Comparing the in-context learning capability of ReLIC and baselines on ExtObjNav. The number of episodes in the trial is displayed on the x-axis. The y-axis displays the success or efficiency at that episode count. Agents capable of in-context learning will increase in success and efficiency when encountering more episodes. Each method is run for 3 random seeds and evaluated on 10k distinct sequences. Error bars are standard deviations over trial outcomes between the 3 seeds.
  • Figure 3: Analyzing ReLIC ICL capabilities. \ref{['fig:icl-vs-updates']} shows increased RL training results in agents that have a higher base success and stronger ICL capabilities with error bars giving standard error on the evaluation episodes. \ref{['fig:updates']} shows the partial updates are important in ReLIC. \ref{['fig:sink-kv-learning-curve']} shows Sink-KV is important for learning speed and stability. The results in Fig.\ref{['fig:updates']},\ref{['fig:sink-kv-learning-curve']} use the smaller ReplicaCAD scenes for easier analysis and thus have higher overall success rates. These results performed on the easier ReplicaCAD scenes to save compute, so the numbers are higher overall.
  • Figure 4: (a) ReLIC trained with context length 4k generalizes to operating at 32k steps of in context experience in a new home layout. (b) ReLIC trained at 64k context length shows ICL abilities over 175 episodes. (c) ReLIC can do few-shot imitation learning despite not training for it. The error bars represent the standard error.
  • Figure 5: ICL comparison of ReLIC and baselines in the Darkroom and Miniworld tasks. ReLIC has a higher base performance and adapts to new tasks with less experience. The baselines numbers are obtained from Figures 4b,d of lee2023supervisedpretraininglearnincontext.
  • ...and 10 more figures