Table of Contents
Fetching ...

IGN : Implicit Generative Networks

Haozheng Luo, Tianyi Wu, Colin Feiyu Han, Zhijun Yan

TL;DR

This work tackles distributional reinforcement learning by modeling the full return distribution in a policy via a GAN-augmented Implicit Quantile Network (IGN). It defines the target return $Y$ as $Y=\sum_{t=0}^{\infty} \gamma^{t} R_t$ and uses a min–max objective over a generator $\mathbb{G}$ and a 1-Lipschitz discriminator to approximate $f_Y^{\pi}(y|s,a)$ through a distributional Bellman equation. The approach yields state-of-the-art results on the Atari-57 benchmark, delivering competitive policy optimization and faster policy evaluation (fewer timestamps) while enabling risk-sensitive policies. However, it acknowledges higher computational costs due to GAN training and outlines future work to improve efficiency with attention and kernel methods.

Abstract

In this work, we build recent advances in distributional reinforcement learning to give a state-of-art distributional variant of the model based on the IQN. We achieve this by using the GAN model's generator and discriminator function with the quantile regression to approximate the full quantile value for the state-action return distribution. We demonstrate improved performance on our baseline dataset - 57 Atari 2600 games in the ALE. Also, we use our algorithm to show the state-of-art training performance of risk-sensitive policies in Atari games with the policy optimization and evaluation.

IGN : Implicit Generative Networks

TL;DR

This work tackles distributional reinforcement learning by modeling the full return distribution in a policy via a GAN-augmented Implicit Quantile Network (IGN). It defines the target return as and uses a min–max objective over a generator and a 1-Lipschitz discriminator to approximate through a distributional Bellman equation. The approach yields state-of-the-art results on the Atari-57 benchmark, delivering competitive policy optimization and faster policy evaluation (fewer timestamps) while enabling risk-sensitive policies. However, it acknowledges higher computational costs due to GAN training and outlines future work to improve efficiency with attention and kernel methods.

Abstract

In this work, we build recent advances in distributional reinforcement learning to give a state-of-art distributional variant of the model based on the IQN. We achieve this by using the GAN model's generator and discriminator function with the quantile regression to approximate the full quantile value for the state-action return distribution. We demonstrate improved performance on our baseline dataset - 57 Atari 2600 games in the ALE. Also, we use our algorithm to show the state-of-art training performance of risk-sensitive policies in Atari games with the policy optimization and evaluation.
Paper Structure (19 sections, 17 equations, 9 figures, 2 algorithms)

This paper contains 19 sections, 17 equations, 9 figures, 2 algorithms.

Figures (9)

  • Figure 1: Atari Game
  • Figure 2: GAN configurations. freirich2019distributional (a) Bellman-GAN; (b) WGAN gulrajani2017improved
  • Figure 3: Pong environment for Mean Q-value and W-distance graph based on IQN
  • Figure 4: Pong environment for Mean Q-value and W-distance graph based on IGN
  • Figure 5: Q-distribution based on 150K training steps
  • ...and 4 more figures