Graying the black box: Understanding DQNs
Tom Zahavy, Nir Ben Zrihem, Shie Mannor
TL;DR
The paper investigates how Deep Q-Networks (DQNs) acquire internal structure from high-dimensional inputs, proposing the Semi Aggregated MDP (SAMDP) to extract spatio-temporal abstractions. It introduces manual clustering and SAMDP as tools for interpretability, debugging, and sub-goal detection, and demonstrates their utility on Gridworld and several Atari games (Breakout, Seaquest, Pacman). SAMDP combines temporal and spatial abstractions to identify options and hierarchical policy structures, enabling analysis of policy behavior, state initialization/termination handling, and score-pixel effects. The work shows that DQNs organize the state space into sub-manifolds with low-entropy transitions, offering a path toward more interpretable and potentially more efficient DRL, including shared-autonomy safeguards via eject-based interventions.
Abstract
In recent years there is a growing interest in using deep representations for reinforcement learning. In this paper, we present a methodology and tools to analyze Deep Q-networks (DQNs) in a non-blind matter. Moreover, we propose a new model, the Semi Aggregated Markov Decision Process (SAMDP), and an algorithm that learns it automatically. The SAMDP model allows us to identify spatio-temporal abstractions directly from features and may be used as a sub-goal detector in future work. Using our tools we reveal that the features learned by DQNs aggregate the state space in a hierarchical fashion, explaining its success. Moreover, we are able to understand and describe the policies learned by DQNs for three different Atari2600 games and suggest ways to interpret, debug and optimize deep neural networks in reinforcement learning.
