PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations

Rico Jonschkowski; Roland Hafner; Jonathan Scholz; Martin Riedmiller

PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations

Rico Jonschkowski, Roland Hafner, Jonathan Scholz, Martin Riedmiller

TL;DR

PVEs propose unsupervised learning of structured state representations by splitting latent space into position and velocity components, with velocity inferred via finite differences. The method uses robotic priors—variation, slowness, inertia, conservation, and controllability—to train encoders without decoders or reconstruction, yielding low-dimensional, task-relevant representations from pixel observations. Across three MuJoCo tasks, PVEs recover meaningful topologies, admit consistent representations across viewpoints, and enable reinforcement learning with improved or comparable control performance. Ball in cup remains challenging due to rapid dynamics and noisy velocity estimates, guiding future work toward richer priors and tighter integration with RL.

Abstract

We propose position-velocity encoders (PVEs) which learn---without supervision---to encode images to positions and velocities of task-relevant objects. PVEs encode a single image into a low-dimensional position state and compute the velocity state from finite differences in position. In contrast to autoencoders, position-velocity encoders are not trained by image reconstruction, but by making the position-velocity representation consistent with priors about interacting with the physical world. We applied PVEs to several simulated control tasks from pixels and achieved promising preliminary results.

PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations

TL;DR

Abstract

PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)