iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning
Aidan Scannell, Kalle Kujanpää, Yi Zhao, Mohammadreza Nakhaei, Arno Solin, Joni Pajarinen
TL;DR
The paper tackles the data-hungry nature of reinforcement learning by proposing iQRL, a task-agnostic representation learning approach that relies solely on a latent-state consistency objective and Finite Scalar Quantization to produce implicitly quantized latent representations. By combining an encoder, a latent-space dynamics model, and a quantization step, and then applying a model-free TD3 agent in the latent space, iQRL achieves high sample efficiency and robust performance on the DeepMind Control Suite without reconstruction or reward prediction components. Key findings include preservation of latent-space rank through quantization, avoidance of representation and dimensional collapse, and the observation that reconstruction or reward-prediction losses are not necessary for learning effective representations. The method is simple, compatible with existing model-free RL algorithms, and yields task-agnostic representations that scale to high-dimensional control, with potential applicability to multi-task settings and stochastic environments in future work.
Abstract
Learning representations for reinforcement learning (RL) has shown much promise for continuous control. We propose an efficient representation learning method using only a self-supervised latent-state consistency loss. Our approach employs an encoder and a dynamics model to map observations to latent states and predict future latent states, respectively. We achieve high performance and prevent representation collapse by quantizing the latent representation such that the rank of the representation is empirically preserved. Our method, named iQRL: implicitly Quantized Reinforcement Learning, is straightforward, compatible with any model-free RL algorithm, and demonstrates excellent performance by outperforming other recently proposed representation learning methods in continuous control benchmarks from DeepMind Control Suite.
