Table of Contents
Fetching ...

Learning to Play Video Games with Intuitive Physics Priors

Abhishek Jaiswal, Nisheeth Srivastava

TL;DR

This work tackles the limited generalization of deep reinforcement learning in video games by introducing intuitive physics priors in the form of affordance-based, object-category representations. A Q-learning agent operates on a compact state space derived from five object categories, enabling infant-like learning that transfers across game variants and even to custom games. Experimental results show faster learning and better robustness to variations compared with pixel-based DQN, particularly under object-perturbations, though performance can degrade in highly dynamic scenarios with narrow state reach. The findings suggest a promising path toward human-centric learning that generalizes beyond the training domain and motivate future model-based extensions and broader applicability.

Abstract

Video game playing is an extremely structured domain where algorithmic decision-making can be tested without adverse real-world consequences. While prevailing methods rely on image inputs to avoid the problem of hand-crafting state space representations, this approach systematically diverges from the way humans actually learn to play games. In this paper, we design object-based input representations that generalize well across a number of video games. Using these representations, we evaluate an agent's ability to learn games similar to an infant - with limited world experience, employing simple inductive biases derived from intuitive representations of physics from the real world. Using such biases, we construct an object category representation to be used by a Q-learning algorithm and assess how well it learns to play multiple games based on observed object affordances. Our results suggest that a human-like object interaction setup capably learns to play several video games, and demonstrates superior generalizability, particularly for unfamiliar objects. Further exploring such methods will allow machines to learn in a human-centric way, thus incorporating more human-like learning benefits.

Learning to Play Video Games with Intuitive Physics Priors

TL;DR

This work tackles the limited generalization of deep reinforcement learning in video games by introducing intuitive physics priors in the form of affordance-based, object-category representations. A Q-learning agent operates on a compact state space derived from five object categories, enabling infant-like learning that transfers across game variants and even to custom games. Experimental results show faster learning and better robustness to variations compared with pixel-based DQN, particularly under object-perturbations, though performance can degrade in highly dynamic scenarios with narrow state reach. The findings suggest a promising path toward human-centric learning that generalizes beyond the training domain and motivate future model-based extensions and broader applicability.

Abstract

Video game playing is an extremely structured domain where algorithmic decision-making can be tested without adverse real-world consequences. While prevailing methods rely on image inputs to avoid the problem of hand-crafting state space representations, this approach systematically diverges from the way humans actually learn to play games. In this paper, we design object-based input representations that generalize well across a number of video games. Using these representations, we evaluate an agent's ability to learn games similar to an infant - with limited world experience, employing simple inductive biases derived from intuitive representations of physics from the real world. Using such biases, we construct an object category representation to be used by a Q-learning algorithm and assess how well it learns to play multiple games based on observed object affordances. Our results suggest that a human-like object interaction setup capably learns to play several video games, and demonstrates superior generalizability, particularly for unfamiliar objects. Further exploring such methods will allow machines to learn in a human-centric way, thus incorporating more human-like learning benefits.
Paper Structure (17 sections, 1 equation, 5 figures, 1 table)

This paper contains 17 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: Simple Variations, Crippling Results - Deep Learning Models break even with a slight variation of the environment (Right image - partially randomized enemy positions).
  • Figure 2: Schematic representation of Agent-Action pipeline based upon intuitive physics priors.
  • Figure 3: From Left to Right, each column shows the original games, Position Modification, Color and Size Modification, and Image Modification. For GVGAI games, all objects are rectangles with fixed shapes and sizes, and Image modification is not applicable. For Roadrash, enemy cars are randomly spread over the road, and thus Position Modification is not applicable.
  • Figure 4: MyAliens State Repsentation.
  • Figure 5: Affordance-based Q-learning (Ours) vs. Image-based DQN Normalized Score per epoch plots. a) MyAliensV1: DQN is probably still exploring as it could not learn any meaningful action. b) MyAliensV2 - Both algorithms found difficulty; Q-learning still fairs better, but DQN could not clear even the first level for both variants of MyAliens. c) Roadrash - Very stochastic game with many occasions where avoiding collision is impossible. Q-learning still does better than DQN. d) SpaceInvaders - our algorithm easily learns gameplay using its object-based representation.