IGN : Implicit Generative Networks

Haozheng Luo; Tianyi Wu; Colin Feiyu Han; Zhijun Yan

IGN : Implicit Generative Networks

Haozheng Luo, Tianyi Wu, Colin Feiyu Han, Zhijun Yan

TL;DR

This work tackles distributional reinforcement learning by modeling the full return distribution in a policy via a GAN-augmented Implicit Quantile Network (IGN). It defines the target return $Y$ as $Y=\sum_{t=0}^{\infty} \gamma^{t} R_t$ and uses a min–max objective over a generator $\mathbb{G}$ and a 1-Lipschitz discriminator to approximate $f_Y^{\pi}(y|s,a)$ through a distributional Bellman equation. The approach yields state-of-the-art results on the Atari-57 benchmark, delivering competitive policy optimization and faster policy evaluation (fewer timestamps) while enabling risk-sensitive policies. However, it acknowledges higher computational costs due to GAN training and outlines future work to improve efficiency with attention and kernel methods.

Abstract

In this work, we build recent advances in distributional reinforcement learning to give a state-of-art distributional variant of the model based on the IQN. We achieve this by using the GAN model's generator and discriminator function with the quantile regression to approximate the full quantile value for the state-action return distribution. We demonstrate improved performance on our baseline dataset - 57 Atari 2600 games in the ALE. Also, we use our algorithm to show the state-of-art training performance of risk-sensitive policies in Atari games with the policy optimization and evaluation.

IGN : Implicit Generative Networks

TL;DR

This work tackles distributional reinforcement learning by modeling the full return distribution in a policy via a GAN-augmented Implicit Quantile Network (IGN). It defines the target return

and uses a min–max objective over a generator

and a 1-Lipschitz discriminator to approximate

through a distributional Bellman equation. The approach yields state-of-the-art results on the Atari-57 benchmark, delivering competitive policy optimization and faster policy evaluation (fewer timestamps) while enabling risk-sensitive policies. However, it acknowledges higher computational costs due to GAN training and outlines future work to improve efficiency with attention and kernel methods.

Abstract

Paper Structure (19 sections, 17 equations, 9 figures, 2 algorithms)

This paper contains 19 sections, 17 equations, 9 figures, 2 algorithms.

Introduction
Related Work
Dataset
Quantile Networks and Transfer Learning
Existing Offline Policy Evaluation Methods
Gradient Method
Assumption
Method
Task Definition
Distributional Reinforcement Learning
Quantile Regression Approaching
Distributional Bellman Equation
Estimation and Algorithm
Mathematical Prove
Theorem
...and 4 more sections

Figures (9)

Figure 1: Atari Game
Figure 2: GAN configurations. freirich2019distributional (a) Bellman-GAN; (b) WGAN gulrajani2017improved
Figure 3: Pong environment for Mean Q-value and W-distance graph based on IQN
Figure 4: Pong environment for Mean Q-value and W-distance graph based on IGN
Figure 5: Q-distribution based on 150K training steps
...and 4 more figures

IGN : Implicit Generative Networks

TL;DR

Abstract

IGN : Implicit Generative Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (9)