Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening
Frank S. He, Yang Liu, Alexander G. Schwing, Jian Peng
TL;DR
The paper introduces optimality tightening, a constrained optimization approach added to deep Q-learning to propagate rewards more efficiently, reducing training data and time. By enforcing multi-step bounds on Q-values derived from replayed sequences, the method accelerates convergence while maintaining stability via a quadratic penalty formulation. Empirical results on 49 Atari games show substantial improvements in training speed and performance, with the method achieving strong results using only 10M frames (vs. 200M for DQN) and often outperforming baselines across many titles. The approach is compatible with other DQN enhancements and holds practical potential for faster, data-efficient deep reinforcement learning in complex environments.
Abstract
We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster reward propagation. Our novel technique makes deep reinforcement learning more practical by drastically reducing the training time. We evaluate the performance of our approach on the 49 games of the challenging Arcade Learning Environment, and report significant improvements in both training time and accuracy.
