Table of Contents
Fetching ...

Pokemon Red via Reinforcement Learning

Marco Pleines, Daniel Addis, David Rubinstein, Frank Zimmer, Mike Preuss, Peter Whidden

TL;DR

This work frames Pokémon Red as a challenging long-horizon DRL benchmark and presents a minimal PPO-based baseline trained in a simplified, yet nontrivial, environment up to Cerulean City. It formalizes the game as an MDP with multimodal observations, a discrete action space, and a dense reward schedule, then analyzes how reward shaping, exploration, and horizon length shape learning. Key findings show robust early progress but vulnerabilities to reward-driven exploits and horizon-management challenges, highlighting the need for hierarchical policies and curiosity-driven exploration. The study positions Pokémon Red as a fertile testbed for future research including LLM-based agents, hierarchical DRL, and advanced exploration strategies with practical implications for long-horizon decision-making in open-world tasks.

Abstract

Pokémon Red, a classic Game Boy JRPG, presents significant challenges as a testbed for agents, including multi-tasking, long horizons of tens of thousands of steps, hard exploration, and a vast array of potential policies. We introduce a simplistic environment and a Deep Reinforcement Learning (DRL) training methodology, demonstrating a baseline agent that completes an initial segment of the game up to completing Cerulean City. Our experiments include various ablations that reveal vulnerabilities in reward shaping, where agents exploit specific reward signals. We also discuss limitations and argue that games like Pokémon hold strong potential for future research on Large Language Model agents, hierarchical training algorithms, and advanced exploration methods. Source Code: https://github.com/MarcoMeter/neroRL/tree/poke_red

Pokemon Red via Reinforcement Learning

TL;DR

This work frames Pokémon Red as a challenging long-horizon DRL benchmark and presents a minimal PPO-based baseline trained in a simplified, yet nontrivial, environment up to Cerulean City. It formalizes the game as an MDP with multimodal observations, a discrete action space, and a dense reward schedule, then analyzes how reward shaping, exploration, and horizon length shape learning. Key findings show robust early progress but vulnerabilities to reward-driven exploits and horizon-management challenges, highlighting the need for hierarchical policies and curiosity-driven exploration. The study positions Pokémon Red as a fertile testbed for future research including LLM-based agents, hierarchical DRL, and advanced exploration strategies with practical implications for long-horizon decision-making in open-world tasks.

Abstract

Pokémon Red, a classic Game Boy JRPG, presents significant challenges as a testbed for agents, including multi-tasking, long horizons of tens of thousands of steps, hard exploration, and a vast array of potential policies. We introduce a simplistic environment and a Deep Reinforcement Learning (DRL) training methodology, demonstrating a baseline agent that completes an initial segment of the game up to completing Cerulean City. Our experiments include various ablations that reveal vulnerabilities in reward shaping, where agents exploit specific reward signals. We also discuss limitations and argue that games like Pokémon hold strong potential for future research on Large Language Model agents, hierarchical training algorithms, and advanced exploration methods. Source Code: https://github.com/MarcoMeter/neroRL/tree/poke_red

Paper Structure

This paper contains 21 sections, 7 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: All subfigures depict game screens possibly perceivable by the agent, presenting various challenges in Pokémon Red that involve exploration, navigation, and strategic decision-making. The agent must traverse a 2D overworld (\ref{['fig:route1']}) with a party of Pokémon (\ref{['fig:party_menu_2']}), winning mandatory trainer battles (\ref{['fig:route3']}) and navigating complex maze-like areas (\ref{['fig:mt_moon']}). Progression also depends on interacting with the game's UI, including healing Pokémon at a Pokémon Center (\ref{['fig:poke_center_heal']}), using strategy to win battles (\ref{['fig:battle_geodude_2']}), using items (\ref{['fig:item_menu']}). Overcoming obstacles, like cuttable trees (\ref{['fig:vermilion_cut']}), requires the use of mandatory game-mechanic moves, which must be obtained via exploration, taught to eligible Pokémon, and used via the Start menu interface when facing the obstacle.
  • Figure 2: Actor-Critic Network (2M parameters, GRU: 4M).
  • Figure 3: Mean completion curves for distinct milestones. Beating Misty can be skipped to reach Vermilion City.