Learning to Play Video Games with Intuitive Physics Priors
Abhishek Jaiswal, Nisheeth Srivastava
TL;DR
This work tackles the limited generalization of deep reinforcement learning in video games by introducing intuitive physics priors in the form of affordance-based, object-category representations. A Q-learning agent operates on a compact state space derived from five object categories, enabling infant-like learning that transfers across game variants and even to custom games. Experimental results show faster learning and better robustness to variations compared with pixel-based DQN, particularly under object-perturbations, though performance can degrade in highly dynamic scenarios with narrow state reach. The findings suggest a promising path toward human-centric learning that generalizes beyond the training domain and motivate future model-based extensions and broader applicability.
Abstract
Video game playing is an extremely structured domain where algorithmic decision-making can be tested without adverse real-world consequences. While prevailing methods rely on image inputs to avoid the problem of hand-crafting state space representations, this approach systematically diverges from the way humans actually learn to play games. In this paper, we design object-based input representations that generalize well across a number of video games. Using these representations, we evaluate an agent's ability to learn games similar to an infant - with limited world experience, employing simple inductive biases derived from intuitive representations of physics from the real world. Using such biases, we construct an object category representation to be used by a Q-learning algorithm and assess how well it learns to play multiple games based on observed object affordances. Our results suggest that a human-like object interaction setup capably learns to play several video games, and demonstrates superior generalizability, particularly for unfamiliar objects. Further exploring such methods will allow machines to learn in a human-centric way, thus incorporating more human-like learning benefits.
