Table of Contents
Fetching ...

iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning

Aidan Scannell, Kalle Kujanpää, Yi Zhao, Mohammadreza Nakhaei, Arno Solin, Joni Pajarinen

TL;DR

The paper tackles the data-hungry nature of reinforcement learning by proposing iQRL, a task-agnostic representation learning approach that relies solely on a latent-state consistency objective and Finite Scalar Quantization to produce implicitly quantized latent representations. By combining an encoder, a latent-space dynamics model, and a quantization step, and then applying a model-free TD3 agent in the latent space, iQRL achieves high sample efficiency and robust performance on the DeepMind Control Suite without reconstruction or reward prediction components. Key findings include preservation of latent-space rank through quantization, avoidance of representation and dimensional collapse, and the observation that reconstruction or reward-prediction losses are not necessary for learning effective representations. The method is simple, compatible with existing model-free RL algorithms, and yields task-agnostic representations that scale to high-dimensional control, with potential applicability to multi-task settings and stochastic environments in future work.

Abstract

Learning representations for reinforcement learning (RL) has shown much promise for continuous control. We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. Our approach employs an encoder and a dynamics model to map observations to latent states and predict future latent states, respectively. We achieve high performance and prevent representation collapse by quantizing the latent representation such that the rank of the representation is empirically preserved. Our method, named iQRL: implicitly Quantized Reinforcement Learning, is straightforward, compatible with any model-free RL algorithm, and demonstrates excellent performance by outperforming other recently proposed representation learning methods in continuous control benchmarks from DeepMind Control Suite.

iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning

TL;DR

The paper tackles the data-hungry nature of reinforcement learning by proposing iQRL, a task-agnostic representation learning approach that relies solely on a latent-state consistency objective and Finite Scalar Quantization to produce implicitly quantized latent representations. By combining an encoder, a latent-space dynamics model, and a quantization step, and then applying a model-free TD3 agent in the latent space, iQRL achieves high sample efficiency and robust performance on the DeepMind Control Suite without reconstruction or reward prediction components. Key findings include preservation of latent-space rank through quantization, avoidance of representation and dimensional collapse, and the observation that reconstruction or reward-prediction losses are not necessary for learning effective representations. The method is simple, compatible with existing model-free RL algorithms, and yields task-agnostic representations that scale to high-dimensional control, with potential applicability to multi-task settings and stochastic environments in future work.

Abstract

Learning representations for reinforcement learning (RL) has shown much promise for continuous control. We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. Our approach employs an encoder and a dynamics model to map observations to latent states and predict future latent states, respectively. We achieve high performance and prevent representation collapse by quantizing the latent representation such that the rank of the representation is empirically preserved. Our method, named iQRL: implicitly Quantized Reinforcement Learning, is straightforward, compatible with any model-free RL algorithm, and demonstrates excellent performance by outperforming other recently proposed representation learning methods in continuous control benchmarks from DeepMind Control Suite.
Paper Structure (38 sections, 7 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 38 sections, 7 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: Overview.iQRL is a stand-alone representation learning technique that is compatible with any model-free RL algorithm (we use TD3 fujimotoAddressingFunctionApproximation2018). Importantly, iQRL quantizes the latent representation with Finite Scalar Quantization (FSQ, ), using only a self-supervised latent-state consistency loss, i.e. no decoder (see \ref{['eq:rep-loss']}). Making the latent representation discrete with an implicit codebook () contributes to the very high sample efficiency of iQRL and empirically prevents representation collapse. Thanks to the FSQ-based quantization, iQRL does not need a reward prediction head to prevent representation collapse, a well-known issue with self-supervised learning, making the representation task-agnostic.
  • Figure 2: DeepMind Control Suite results.iQRL (red) is significantly more sample efficient than other model-free baselines TCRL (green), TD7 (purple), TACO (blue) and TD3 (orange). iQRL performs particularly well in the high-dimensional locomotion tasks and outperforms TCRL, which is the most similar baseline. Results are for 20 DMC tasks with UTD=1. We plot the mean (solid line) and the $95\%$ confidence intervals (shaded) across 5 random seeds, where each seed averages over 10 evaluation episodes. See \ref{['fig:dmc_grid']} for results in other DMC tasks.
  • Figure 3: Ablation of quantization. We show how our quantization scheme prevents dimensional collapse. In all tasks, our FSQ scheme prevents dimensional collapse (red) as the rank of the representation remains high. In contrast, when our quantization is not used (blue) the representation undergoes dimensional collapse, indicated by the rank reducing. In the Dog Run task, this results in the agent not learning to solve the task. We plot the mean (solid line) and the $95\%$ confidence intervals (shaded) across 5 random seeds, where each seed averages over 10 evaluation episodes.
  • Figure 4: Reward prediction is not necessary for representation learning. We compare iQRL to a variant of our method with a reward prediction head trained to predict the reward from the current latent state. Adding a reward prediction head to iQRL leads into a slight increase in performance in Dog Run, but has a slightly harmful impact on sample efficiency in Humanoid Walk and Quadruped Run. We plot the mean (solid line) and the $95\%$ confidence intervals (shaded) across 5 random seeds, where each seed averages over 10 evaluation episodes.
  • Figure 5: Reconstruction loss has detrimental impact. Unlike many methods, such as SAC-AE yaratsImprovingSampleEfficiency2021, iQRL neither has an observation decoder nor a reconstruction term in the loss function. We show that adding a reconstruction loss harms the performance of iQRL across a mixture of easy and hard evaluation environments. We plot the mean (solid line) and the $95\%$ confidence intervals (shaded) across 5 random seeds, where each seed averages over 10 evaluation episodes.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Definition 3.1: Complete representation collapse
  • Definition 3.2: Dimensional collapse