The Indoor-Training Effect: unexpected gains from distribution shifts in the transition function
Serena Bono, Spandan Madan, Ishaan Grover, Mao Yasueda, Cynthia Breazeal, Hanspeter Pfister, Gabriel Kreiman
TL;DR
The paper investigates how distribution shifts in transition dynamics affect reinforcement learning generalization, introducing Noise Injection to create δ-environments around a target MDP $M_T$. By comparing a Learnability agent trained and tested on the δ-environment $M_\delta$ with a Generalization agent trained on $M_T$ and tested on $M_\delta$, the authors reveal the Indoor-Training Effect: in many cases, training in the noise-free environment yields better test performance under noise. This counterintuitive finding holds across 60 MDP variations in three ATARI domains (PacMan, Pong, Breakout) and extends to semantic and non-semantic changes, with exploration patterns predicting the performance gap. The results persist in deep RL (DQN), suggesting that simple, controlled training environments can foster more robust policies for noisy deployment and have implications for robotics and transfer learning. Limitations include the Atari-focused domain and classical RL; future work could extend to real-world settings and broader DRL algorithms.
Abstract
Is it better to perform tennis training in a pristine indoor environment or a noisy outdoor one? To model this problem, here we investigate whether shifts in the transition probabilities between the training and testing environments in reinforcement learning problems can lead to better performance under certain conditions. We generate new Markov Decision Processes (MDPs) starting from a given MDP, by adding quantifiable, parametric noise into the transition function. We refer to this process as Noise Injection and the resulting environments as δ-environments. This process allows us to create variations of the same environment with quantitative control over noise serving as a metric of distance between environments. Conventional wisdom suggests that training and testing on the same MDP should yield the best results. In stark contrast, we observe that agents can perform better when trained on the noise-free environment and tested on the noisy δ-environments, compared to training and testing on the same δ-environments. We confirm that this finding extends beyond noise variations: it is possible to showcase the same phenomenon in ATARI game variations including varying Ghost behaviour in PacMan, and Paddle behaviour in Pong. We demonstrate this intriguing behaviour across 60 different variations of ATARI games, including PacMan, Pong, and Breakout. We refer to this phenomenon as the Indoor-Training Effect. Code to reproduce our experiments and to implement Noise Injection can be found at https://bit.ly/3X6CTYk.
