Emergence of In-Context Reinforcement Learning from Noise Distillation
Ilya Zisman, Vladislav Kurenkov, Alexander Nikulin, Viacheslav Sinii, Sergey Kolesnikov
TL;DR
This work tackles the data bottleneck in in-context Reinforcement Learning by introducing AD$^$ε, a noise-distillation curriculum that generates learning histories without requiring thousands of RL agents or access to an optimal policy. By gradually injecting noise into demonstrator policies, the method yields synthetic trajectories that encode progressive policy improvement, enabling a Transformer to distill an in-context learning algorithm from suboptimal data. The authors demonstrate emergent in-context RL on grid-world and pixel-based 3D tasks, with the in-context agent outperforming the best data policy by up to about $2\times$, and show robustness across suboptimal trajectories and varying learning pace. Overall, AD$^$ε lowers data barriers to in-context RL and highlights learning-pace dynamics as a critical lever for generalization and adaptation in noisy learning histories.
Abstract
Recently, extensive studies in Reinforcement Learning have been carried out on the ability of transformers to adapt in-context to various environments and tasks. Current in-context RL methods are limited by their strict requirements for data, which needs to be generated by RL agents or labeled with actions from an optimal policy. In order to address this prevalent problem, we propose AD$^\varepsilon$, a new data acquisition approach that enables in-context Reinforcement Learning from noise-induced curriculum. We show that it is viable to construct a synthetic noise injection curriculum which helps to obtain learning histories. Moreover, we experimentally demonstrate that it is possible to alleviate the need for generation using optimal policies, with in-context RL still able to outperform the best suboptimal policy in a learning dataset by a 2x margin.
