NFQ2.0: The CartPole Benchmark Revisited
Sascha Lange, Roland Hafner, Martin Riedmiller
TL;DR
NFQ2.0 revisits the neural fitted Q-iteration, showing that a modern batch-learning variant can compete with contemporary Deep RL methods on a real-world CartPole system. By adopting larger networks, continuous single-network training, stacking, and offline/online hybrid strategies (including hindsight relabeling and offline bootstrapping), the approach achieves stable, repeatable learning with relatively small data requirements. The work provides detailed ablations and offline-online comparisons, illustrating how cost shaping, action encoding, and training regimes influence learning speed and robustness, and demonstrates practical techniques for transferring RL to industrial contexts. Collectively, NFQ2.0 offers a practical, open-source, and transferable framework for applying deep RL in real systems, with clear guidance on parameter choices and offline strategies to reduce cycle time and risk.
Abstract
This article revisits the 20-year-old neural fitted Q-iteration (NFQ) algorithm on its classical CartPole benchmark. NFQ was a pioneering approach towards modern Deep Reinforcement Learning (Deep RL) in applying multi-layer neural networks to reinforcement learning for real-world control problems. We explore the algorithm's conceptual simplicity and its transition from online to batch learning, which contributed to its stability. Despite its initial success, NFQ required extensive tuning and was not easily reproducible on real-world control problems. We propose a modernized variant NFQ2.0 and apply it to the CartPole task, concentrating on a real-world system build from standard industrial components, to investigate and improve the learning process's repeatability and robustness. Through ablation studies, we highlight key design decisions and hyperparameters that enhance performance and stability of NFQ2.0 over the original variant. Finally, we demonstrate how our findings can assist practitioners in reproducing and improving results and applying deep reinforcement learning more effectively in industrial contexts.
