iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning

Aidan Scannell; Kalle Kujanpää; Yi Zhao; Mohammadreza Nakhaei; Arno Solin; Joni Pajarinen

iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning

Aidan Scannell, Kalle Kujanpää, Yi Zhao, Mohammadreza Nakhaei, Arno Solin, Joni Pajarinen

TL;DR

The paper tackles the data-hungry nature of reinforcement learning by proposing iQRL, a task-agnostic representation learning approach that relies solely on a latent-state consistency objective and Finite Scalar Quantization to produce implicitly quantized latent representations. By combining an encoder, a latent-space dynamics model, and a quantization step, and then applying a model-free TD3 agent in the latent space, iQRL achieves high sample efficiency and robust performance on the DeepMind Control Suite without reconstruction or reward prediction components. Key findings include preservation of latent-space rank through quantization, avoidance of representation and dimensional collapse, and the observation that reconstruction or reward-prediction losses are not necessary for learning effective representations. The method is simple, compatible with existing model-free RL algorithms, and yields task-agnostic representations that scale to high-dimensional control, with potential applicability to multi-task settings and stochastic environments in future work.

Abstract

Learning representations for reinforcement learning (RL) has shown much promise for continuous control. We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. Our approach employs an encoder and a dynamics model to map observations to latent states and predict future latent states, respectively. We achieve high performance and prevent representation collapse by quantizing the latent representation such that the rank of the representation is empirically preserved. Our method, named iQRL: implicitly Quantized Reinforcement Learning, is straightforward, compatible with any model-free RL algorithm, and demonstrates excellent performance by outperforming other recently proposed representation learning methods in continuous control benchmarks from DeepMind Control Suite.

iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning

TL;DR

Abstract

Paper Structure (38 sections, 7 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 38 sections, 7 equations, 11 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Representation learning
Observation reconstruction
Latent-state consistency
Latent-state consistency for model-based RL
Contrastive learning
Preliminaries
Representation collapse
Method
Method components
Representation learning
Quantization
Model-free reinforcement learning
Experiments
...and 23 more sections

Figures (11)

Figure 1: Overview.iQRL is a stand-alone representation learning technique that is compatible with any model-free RL algorithm (we use TD3 fujimotoAddressingFunctionApproximation2018). Importantly, iQRL quantizes the latent representation with Finite Scalar Quantization (FSQ, ), using only a self-supervised latent-state consistency loss, i.e. no decoder (see \ref{['eq:rep-loss']}). Making the latent representation discrete with an implicit codebook () contributes to the very high sample efficiency of iQRL and empirically prevents representation collapse. Thanks to the FSQ-based quantization, iQRL does not need a reward prediction head to prevent representation collapse, a well-known issue with self-supervised learning, making the representation task-agnostic.
Figure 2: DeepMind Control Suite results.iQRL (red) is significantly more sample efficient than other model-free baselines TCRL (green), TD7 (purple), TACO (blue) and TD3 (orange). iQRL performs particularly well in the high-dimensional locomotion tasks and outperforms TCRL, which is the most similar baseline. Results are for 20 DMC tasks with UTD=1. We plot the mean (solid line) and the $95\%$ confidence intervals (shaded) across 5 random seeds, where each seed averages over 10 evaluation episodes. See \ref{['fig:dmc_grid']} for results in other DMC tasks.
Figure 3: Ablation of quantization. We show how our quantization scheme prevents dimensional collapse. In all tasks, our FSQ scheme prevents dimensional collapse (red) as the rank of the representation remains high. In contrast, when our quantization is not used (blue) the representation undergoes dimensional collapse, indicated by the rank reducing. In the Dog Run task, this results in the agent not learning to solve the task. We plot the mean (solid line) and the $95\%$ confidence intervals (shaded) across 5 random seeds, where each seed averages over 10 evaluation episodes.
Figure 4: Reward prediction is not necessary for representation learning. We compare iQRL to a variant of our method with a reward prediction head trained to predict the reward from the current latent state. Adding a reward prediction head to iQRL leads into a slight increase in performance in Dog Run, but has a slightly harmful impact on sample efficiency in Humanoid Walk and Quadruped Run. We plot the mean (solid line) and the $95\%$ confidence intervals (shaded) across 5 random seeds, where each seed averages over 10 evaluation episodes.
Figure 5: Reconstruction loss has detrimental impact. Unlike many methods, such as SAC-AE yaratsImprovingSampleEfficiency2021, iQRL neither has an observation decoder nor a reconstruction term in the loss function. We show that adding a reconstruction loss harms the performance of iQRL across a mixture of easy and hard evaluation environments. We plot the mean (solid line) and the $95\%$ confidence intervals (shaded) across 5 random seeds, where each seed averages over 10 evaluation episodes.
...and 6 more figures

Theorems & Definitions (2)

Definition 3.1: Complete representation collapse
Definition 3.2: Dimensional collapse

iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning

TL;DR

Abstract

iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (2)