Table of Contents
Fetching ...

Implicit Quantile Networks for Distributional Reinforcement Learning

Will Dabney, Georg Ostrovski, David Silver, Rémi Munos

TL;DR

This work extends distributional reinforcement learning by introducing implicit quantile networks (IQN), a simple yet powerful generalization that learns the full quantile function of the return distribution via a reparameterized tau input. IQN supports flexible sampling and enables distortion-based, risk-sensitive policies, connecting distributional modeling with practical control strategies. Empirically, IQN significantly outperforms QR-DQN and closely approaches Rainbow on the Atari-57 suite, with notable gains in hard games and in risk-averse settings. The approach offers a unified framework that improves data efficiency, policy expressiveness, and adaptability without extensive architectural overhauls.

Abstract

In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm's implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.

Implicit Quantile Networks for Distributional Reinforcement Learning

TL;DR

This work extends distributional reinforcement learning by introducing implicit quantile networks (IQN), a simple yet powerful generalization that learns the full quantile function of the return distribution via a reparameterized tau input. IQN supports flexible sampling and enables distortion-based, risk-sensitive policies, connecting distributional modeling with practical control strategies. Empirically, IQN significantly outperforms QR-DQN and closely approaches Rainbow on the Atari-57 suite, with notable gains in hard games and in risk-averse settings. The approach offers a unified framework that improves data efficiency, policy expressiveness, and adaptability without extensive architectural overhauls.

Abstract

In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm's implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.

Paper Structure

This paper contains 11 sections, 24 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Network architectures for DQN and recent distributional RL algorithms.
  • Figure 2: Effect of varying $N$ and $N'$, the number of samples used in the loss function in Equation \ref{['eqn:iqn_loss']}. Figures show human-normalized agent performance, averaged over six Atari games, averaged over first 10M frames of training (left) and last 10M frames of training (right). Corresponding values for baselines: DQN ($32, 253$) and QR-DQN ($144, 1243$).
  • Figure 3: Effects of various changes to the sampling distribution, that is various cumulative probability weightings.
  • Figure 4: Human-normalized mean (left) and median (right) scores on Atari-57 for IQN and various other algorithms. Random seeds shown as traces, with IQN averaged over 5, QR-DQN over 3, and Rainbow over 2 random seeds.
  • Figure 5: Comparison of architectural variants.
  • ...and 2 more figures