Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning
Donghu Kim, Hojoon Lee, Kyungmin Lee, Dongyoon Hwang, Jaegul Choo
TL;DR
The paper tackles how pre-training objectives shape generalization for vision-based RL under distribution shifts. It introduces Atari-PB, a unified benchmark that pre-trains a ResNet-50 on 10 million transitions from 50 Atari games and evaluates across ID, Near-OOD, and Far-OOD, using a range of data-type objectives including image, video, demonstrations, and trajectories. The key findings show that task-agnostic pre-training (capturing spatial and temporal structure) consistently improves generalization across distributions, while task-specific pre-training (demonstrations or rewards) benefits ID/Near-OOD but often harms Far-OOD performance; trajectory-based pre-training yields strong ID results. The results highlight the value of temporal dynamics in generalization and suggest future architectures that decouple task-agnostic from task-specific features to optimize downstream RL across diverse environments.
Abstract
Recently, various pre-training methods have been introduced in vision-based Reinforcement Learning (RL). However, their generalization ability remains unclear due to evaluations being limited to in-distribution environments and non-unified experimental setups. To address this, we introduce the Atari Pre-training Benchmark (Atari-PB), which pre-trains a ResNet-50 model on 10 million transitions from 50 Atari games and evaluates it across diverse environment distributions. Our experiments show that pre-training objectives focused on learning task-agnostic features (e.g., identifying objects and understanding temporal dynamics) enhance generalization across different environments. In contrast, objectives focused on learning task-specific knowledge (e.g., identifying agents and fitting reward functions) improve performance in environments similar to the pre-training dataset but not in varied ones. We publicize our codes, datasets, and model checkpoints at https://github.com/dojeon-ai/Atari-PB.
