Table of Contents
Fetching ...

Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning

Kimin Lee, Kibok Lee, Jinwoo Shin, Honglak Lee

TL;DR

This work tackles the generalization gap in deep reinforcement learning when test environments present unseen visual patterns. It introduces Network Randomization, which perturbs inputs with a randomly initialized single-layer CNN at training time and combines a policy objective with a feature-matching loss to encourage invariant representations; Monte Carlo inference is used at test time to stabilize decisions. Across CoinRun, DeepMind Lab, and Surreal robotics, the method outperforms regularization and data augmentation baselines, achieving notable gains in unseen environments and revealing more compact, task-relevant activations. The approach is simple, simulator-light, and has potential implications for sim-to-real transfer, adversarial robustness, and dynamics generalization.

Abstract

Deep reinforcement learning (RL) agents often fail to generalize to unseen environments (yet semantically similar to trained agents), particularly when they are trained on high-dimensional state spaces, such as images. In this paper, we propose a simple technique to improve a generalization ability of deep RL agents by introducing a randomized (convolutional) neural network that randomly perturbs input observations. It enables trained agents to adapt to new domains by learning robust features invariant across varied and randomized environments. Furthermore, we consider an inference method based on the Monte Carlo approximation to reduce the variance induced by this randomization. We demonstrate the superiority of our method across 2D CoinRun, 3D DeepMind Lab exploration and 3D robotics control tasks: it significantly outperforms various regularization and data augmentation methods for the same purpose.

Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning

TL;DR

This work tackles the generalization gap in deep reinforcement learning when test environments present unseen visual patterns. It introduces Network Randomization, which perturbs inputs with a randomly initialized single-layer CNN at training time and combines a policy objective with a feature-matching loss to encourage invariant representations; Monte Carlo inference is used at test time to stabilize decisions. Across CoinRun, DeepMind Lab, and Surreal robotics, the method outperforms regularization and data augmentation baselines, achieving notable gains in unseen environments and revealing more compact, task-relevant activations. The approach is simple, simulator-light, and has potential implications for sim-to-real transfer, adversarial robustness, and dynamics generalization.

Abstract

Deep reinforcement learning (RL) agents often fail to generalize to unseen environments (yet semantically similar to trained agents), particularly when they are trained on high-dimensional state spaces, such as images. In this paper, we propose a simple technique to improve a generalization ability of deep RL agents by introducing a randomized (convolutional) neural network that randomly perturbs input observations. It enables trained agents to adapt to new domains by learning robust features invariant across varied and randomized environments. Furthermore, we consider an inference method based on the Monte Carlo approximation to reduce the variance induced by this randomization. We demonstrate the superiority of our method across 2D CoinRun, 3D DeepMind Lab exploration and 3D robotics control tasks: it significantly outperforms various regularization and data augmentation methods for the same purpose.

Paper Structure

This paper contains 24 sections, 4 equations, 17 figures, 5 tables, 1 algorithm.

Figures (17)

  • Figure 1: (a) Examples of randomized inputs (color values in each channel are normalized for visualization) generated by re-initializing the parameters of a random layer. Examples of seen and unseen environments on (b) CoinRun, (c) DeepMind Lab, and (d) Surreal robotics control.
  • Figure 2: Samples of dogs vs. cats dataset. The training set consists of bright dogs and dark cats, whereas the test set consists of dark dogs and bright cats.
  • Figure 3: (a) We collect multiple episodes from various environments by human demonstrators and visualize the hidden representation of trained agents optimized by (b) PPO and (c) PPO + ours constructed by t-SNE, where the colors of points indicate the environments of the corresponding observations. (d) Average success rates for varying number of MC samples.
  • Figure 4: Visualization of activation maps via Grad-CAM in seen and unseen environments in the small-scale CoinRun. Images are aligned with similar states from various episodes for comparison.
  • Figure 5: The performances of trained agents in unseen environments under (a) large-scale CoinRun, (b) DeepMind Lab and (c) Surreal robotics control. The solid/dashed lines and shaded regions represent the mean and standard deviation, respectively.
  • ...and 12 more figures