Table of Contents
Fetching ...

Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder

Yuhan Zhang, Guoqing Ma, Guangfu Hao, Liangxuan Guo, Yang Chen, Shan Yu

TL;DR

This paper tackles the generalization bottleneck of vision-based RL by decoupling representation learning from policy optimization through Adaptive Pretrained visual Encoder (APE). APE pretrains a fixed image encoder on a broad distribution using adaptive data augmentations with a contrastive objective, then leverages the learned representations in a DreamerV3-based world model with minimal additional interactions. Across DeepMind Control Suite, Atari 100k, and Memory Maze, APE delivers state-of-the-art performance for several backbones and substantially improves sampling efficiency, approaching state-based SAC in some tasks. The results underscore the value of adaptive pretraining on diverse visual data for enhancing generalization and data efficiency in visual RL, without auxiliary supervisory signals during policy learning.

Abstract

While Reinforcement Learning (RL) agents can successfully learn to handle complex tasks, effectively generalizing acquired skills to unfamiliar settings remains a challenge. One of the reasons behind this is the visual encoders used are task-dependent, preventing effective feature extraction in different settings. To address this issue, recent studies have tried to pretrain encoders with diverse visual inputs in order to improve their performance. However, they rely on existing pretrained encoders without further exploring the impact of pretraining period. In this work, we propose APE: efficient reinforcement learning through Adaptively Pretrained visual Encoder -- a framework that utilizes adaptive augmentation strategy during the pretraining phase and extracts generalizable features with only a few interactions within the task environments in the policy learning period. Experiments are conducted across various domains, including DeepMind Control Suite, Atari Games and Memory Maze benchmarks, to verify the effectiveness of our method. Results show that mainstream RL methods, such as DreamerV3 and DrQ-v2, achieve state-of-the-art performance when equipped with APE. In addition, APE significantly improves the sampling efficiency using only visual inputs during learning, approaching the efficiency of state-based method in several control tasks. These findings demonstrate the potential of adaptive pretraining of encoder in enhancing the generalization ability and efficiency of visual RL algorithms.

Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder

TL;DR

This paper tackles the generalization bottleneck of vision-based RL by decoupling representation learning from policy optimization through Adaptive Pretrained visual Encoder (APE). APE pretrains a fixed image encoder on a broad distribution using adaptive data augmentations with a contrastive objective, then leverages the learned representations in a DreamerV3-based world model with minimal additional interactions. Across DeepMind Control Suite, Atari 100k, and Memory Maze, APE delivers state-of-the-art performance for several backbones and substantially improves sampling efficiency, approaching state-based SAC in some tasks. The results underscore the value of adaptive pretraining on diverse visual data for enhancing generalization and data efficiency in visual RL, without auxiliary supervisory signals during policy learning.

Abstract

While Reinforcement Learning (RL) agents can successfully learn to handle complex tasks, effectively generalizing acquired skills to unfamiliar settings remains a challenge. One of the reasons behind this is the visual encoders used are task-dependent, preventing effective feature extraction in different settings. To address this issue, recent studies have tried to pretrain encoders with diverse visual inputs in order to improve their performance. However, they rely on existing pretrained encoders without further exploring the impact of pretraining period. In this work, we propose APE: efficient reinforcement learning through Adaptively Pretrained visual Encoder -- a framework that utilizes adaptive augmentation strategy during the pretraining phase and extracts generalizable features with only a few interactions within the task environments in the policy learning period. Experiments are conducted across various domains, including DeepMind Control Suite, Atari Games and Memory Maze benchmarks, to verify the effectiveness of our method. Results show that mainstream RL methods, such as DreamerV3 and DrQ-v2, achieve state-of-the-art performance when equipped with APE. In addition, APE significantly improves the sampling efficiency using only visual inputs during learning, approaching the efficiency of state-based method in several control tasks. These findings demonstrate the potential of adaptive pretraining of encoder in enhancing the generalization ability and efficiency of visual RL algorithms.

Paper Structure

This paper contains 36 sections, 11 equations, 15 figures, 7 tables, 1 algorithm.

Figures (15)

  • Figure 1: Visualization of ResNet-18 model with different pretraining strategy using LayerCAM 9462463, which indicates that APE is able to extract more precise outline of the Walker than other initialization settings. The first row displays the pure feature maps, which are also presented together with the image in the second row.
  • Figure 1: Tasks across three different domains are included in our paper to evaluate the effectiveness of APE.
  • Figure 2: APE pipeline for MBRL. The training phase is divided into two parts, namely the Adaptive Pretraining period (within the blue area) and the Downstream Policy Learning period (within the yellow area). A wide variety of real-world images are augmented using an adaptive data augmentation strategy in the first period, which dynamically updates the sampling probability of each augmentation composition in the next pretraining epoch. In the second stage, the pretrained vision encoder is implemented in a generic RL framework as a perception module for the policy.
  • Figure 2: Visualization of reconstructions in different phases during policy learning period of DMC walker walk. The first row in each stage shows the real states of the agent, while the second row depicts the predictions reconstructed by the latent dynamics. The third row displays the prediction accuracy by comparing the actual states' outline with the predicted ones.
  • Figure 3: Training curves for DMC vision benchmarks.
  • ...and 10 more figures