Pokemon Red via Reinforcement Learning
Marco Pleines, Daniel Addis, David Rubinstein, Frank Zimmer, Mike Preuss, Peter Whidden
TL;DR
This work frames Pokémon Red as a challenging long-horizon DRL benchmark and presents a minimal PPO-based baseline trained in a simplified, yet nontrivial, environment up to Cerulean City. It formalizes the game as an MDP with multimodal observations, a discrete action space, and a dense reward schedule, then analyzes how reward shaping, exploration, and horizon length shape learning. Key findings show robust early progress but vulnerabilities to reward-driven exploits and horizon-management challenges, highlighting the need for hierarchical policies and curiosity-driven exploration. The study positions Pokémon Red as a fertile testbed for future research including LLM-based agents, hierarchical DRL, and advanced exploration strategies with practical implications for long-horizon decision-making in open-world tasks.
Abstract
Pokémon Red, a classic Game Boy JRPG, presents significant challenges as a testbed for agents, including multi-tasking, long horizons of tens of thousands of steps, hard exploration, and a vast array of potential policies. We introduce a simplistic environment and a Deep Reinforcement Learning (DRL) training methodology, demonstrating a baseline agent that completes an initial segment of the game up to completing Cerulean City. Our experiments include various ablations that reveal vulnerabilities in reward shaping, where agents exploit specific reward signals. We also discuss limitations and argue that games like Pokémon hold strong potential for future research on Large Language Model agents, hierarchical training algorithms, and advanced exploration methods. Source Code: https://github.com/MarcoMeter/neroRL/tree/poke_red
