Table of Contents
Fetching ...

Beyond The Rainbow: High Performance Deep Reinforcement Learning on a Desktop PC

Tyler Clark, Mark Towers, Christine Evers, Jonathon Hare

TL;DR

Beyond The Rainbow is presented, a novel algorithm that integrates six improvements from across the RL literature to Rainbow DQN, establishing a new state-of-the-art for RL using a desktop PC, with a human-normalized interquartile mean (IQM) of 7.4 on Atari-60.

Abstract

Rainbow Deep Q-Network (DQN) demonstrated combining multiple independent enhancements could significantly boost a reinforcement learning (RL) agent's performance. In this paper, we present "Beyond The Rainbow" (BTR), a novel algorithm that integrates six improvements from across the RL literature to Rainbow DQN, establishing a new state-of-the-art for RL using a desktop PC, with a human-normalized interquartile mean (IQM) of 7.4 on Atari-60. Beyond Atari, we demonstrate BTR's capability to handle complex 3D games, successfully training agents to play Super Mario Galaxy, Mario Kart, and Mortal Kombat with minimal algorithmic changes. Designing BTR with computational efficiency in mind, agents can be trained using a high-end desktop PC on 200 million Atari frames within 12 hours. Additionally, we conduct detailed ablation studies of each component, analyzing the performance and impact using numerous measures. Code is available at https://github.com/VIPTankz/BTR.

Beyond The Rainbow: High Performance Deep Reinforcement Learning on a Desktop PC

TL;DR

Beyond The Rainbow is presented, a novel algorithm that integrates six improvements from across the RL literature to Rainbow DQN, establishing a new state-of-the-art for RL using a desktop PC, with a human-normalized interquartile mean (IQM) of 7.4 on Atari-60.

Abstract

Rainbow Deep Q-Network (DQN) demonstrated combining multiple independent enhancements could significantly boost a reinforcement learning (RL) agent's performance. In this paper, we present "Beyond The Rainbow" (BTR), a novel algorithm that integrates six improvements from across the RL literature to Rainbow DQN, establishing a new state-of-the-art for RL using a desktop PC, with a human-normalized interquartile mean (IQM) of 7.4 on Atari-60. Beyond Atari, we demonstrate BTR's capability to handle complex 3D games, successfully training agents to play Super Mario Galaxy, Mario Kart, and Mortal Kombat with minimal algorithmic changes. Designing BTR with computational efficiency in mind, agents can be trained using a high-end desktop PC on 200 million Atari frames within 12 hours. Additionally, we conduct detailed ablation studies of each component, analyzing the performance and impact using numerous measures. Code is available at https://github.com/VIPTankz/BTR.

Paper Structure

This paper contains 38 sections, 3 equations, 19 figures, 12 tables.

Figures (19)

  • Figure 1: Interquartile mean human-normalized performance for BTR against other RL algorithms on the Atari benchmark in terms of walltime performance (all results use 200M frames). The results for DQN and Rainbow DQN are those reported in RLiable agarwal2021deep, and Dreamer-v3 refers to hafner2301mastering. Shaded areas show 95% bootstrapped confidence intervals, with BTR using 4 seeds.
  • Figure 2: Box plot performance of BTR (4 seeds) against other popular algorithms such as MEME kapturowski2022human, Dreamer v3 hafner2301mastering, Bigger, Better, Faster (BBF) schwarzer2023bigger and EfficientZero-v2 (EZV2) wang2024efficientzero. Brackets show the number of frames the algorithms use, the number of walltime hours and the hardware used respectively. Shaded areas show 95% confidence intervals. Top: Atari 55 game benchmark - we used the overlapping games 55 between the popular Atari-57 benchmark, and the 60 games used in RLiable agarwal2021deep. Bottom: Atari-26 benchmark, commonly used for testing sample-efficient algorithms.
  • Figure 3: BTR compared to Rainbow DQN + Impala (width x4) cobbe2020leveraging after 200M frames on the Procgen benchmark. Shaded areas show 95% CIs, with results averaged over 5 seeds.
  • Figure 4: BTR being used to play Super Mario Galaxy (final level), Mario Kart Wii (Rainbow Road) and Mortal Kombat: Armageddon (Endurance Mode) respectively. Consistent completion is defined as over 90%.
  • Figure 5: BTR's human-normalized scores without different components, with shaded areas showing 95% bootstrapped confidence intervals averaged over 4 seeds. Left: Predicted Atari-57 median score using the regression procedure defined in aitchison2023atari. However, we find the prediction does not match the true median (see Appendix \ref{['app:atari5']}). Right: Interquartile mean across the 5 games. For individual game graphs and additional ablations, see Appendices \ref{['app:full-results-graph']} and \ref{['app:extra_ablations']}.
  • ...and 14 more figures