Table of Contents
Fetching ...

Enhancing Two-Player Performance Through Single-Player Knowledge Transfer: An Empirical Study on Atari 2600 Games

Kimiya Saadat, Richard Zhao

TL;DR

The paper investigates improving two-player reinforcement learning by transferring knowledge from the corresponding single-player Atari game, using RAM observations to enable efficient training. Through an empirical study across ten Atari environments, it demonstrates that transferring prelearned representations and freezing early network layers yields at least comparable, and often superior, performance with a substantial reduction in training time. It also introduces a RAM-based complexity visualization and a simple predictor indicating that higher RAM complexity can correlate with greater transfer benefits in some games. These findings suggest a practical approach to stabilizing and accelerating multi-agent RL and provide a RAM-centric tool for anticipating transfer gains.

Abstract

Playing two-player games using reinforcement learning and self-play can be challenging due to the complexity of two-player environments and the possible instability in the training process. We propose that a reinforcement learning algorithm can train more efficiently and achieve improved performance in a two-player game if it leverages the knowledge from the single-player version of the same game. This study examines the proposed idea in ten different Atari 2600 environments using the Atari 2600 RAM as the input state. We discuss the advantages of using transfer learning from a single-player training process over training in a two-player setting from scratch, and demonstrate our results in a few measures such as training time and average total reward. We also discuss a method of calculating RAM complexity and its relationship to performance.

Enhancing Two-Player Performance Through Single-Player Knowledge Transfer: An Empirical Study on Atari 2600 Games

TL;DR

The paper investigates improving two-player reinforcement learning by transferring knowledge from the corresponding single-player Atari game, using RAM observations to enable efficient training. Through an empirical study across ten Atari environments, it demonstrates that transferring prelearned representations and freezing early network layers yields at least comparable, and often superior, performance with a substantial reduction in training time. It also introduces a RAM-based complexity visualization and a simple predictor indicating that higher RAM complexity can correlate with greater transfer benefits in some games. These findings suggest a practical approach to stabilizing and accelerating multi-agent RL and provide a RAM-centric tool for anticipating transfer gains.

Abstract

Playing two-player games using reinforcement learning and self-play can be challenging due to the complexity of two-player environments and the possible instability in the training process. We propose that a reinforcement learning algorithm can train more efficiently and achieve improved performance in a two-player game if it leverages the knowledge from the single-player version of the same game. This study examines the proposed idea in ten different Atari 2600 environments using the Atari 2600 RAM as the input state. We discuss the advantages of using transfer learning from a single-player training process over training in a two-player setting from scratch, and demonstrate our results in a few measures such as training time and average total reward. We also discuss a method of calculating RAM complexity and its relationship to performance.

Paper Structure

This paper contains 17 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Figure 1: The Average total reward of player 1 in the ten Atari games for the transferred agent (in blue) and the agent training from scratch (in orange), in the two-player versions of the games. The x-axis is the number of episodes and the y-axis is the average total reward per episode.
  • Figure 2: Figure 2: Heatmap visualizations of RAM complexity for the ten games. Each pixel represents a RAM byte. A brighter color denotes higher temporal variations in that byte whereas a darker color denotes lower temporal variations.