Table of Contents
Fetching ...

Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics

Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat

TL;DR

This paper tackles sample efficiency in vision-based reinforcement learning for real-robot control by decoupling state representation learning from policy learning. It evaluates several SRL methods on goal-based robotics tasks and introduces SRL Splits, a stacked representation that partitions the latent space to balance multiple learning objectives. The results show that SRL Splits can approach or exceed end-to-end performance with fewer samples and that the method robustly transfers to real robot settings, while random features remain a strong baseline. Overall, the work supports modular SRL approaches for more data-efficient, interpretable, and robust robotic RL.

Abstract

Scaling end-to-end reinforcement learning to control real robots from vision presents a series of challenges, in particular in terms of sample efficiency. Against end-to-end learning, state representation learning can help learn a compact, efficient and relevant representation of states that speeds up policy learning, reducing the number of samples needed, and that is easier to interpret. We evaluate several state representation learning methods on goal based robotics tasks and propose a new unsupervised model that stacks representations and combines strengths of several of these approaches. This method encodes all the relevant features, performs on par or better than end-to-end learning with better sample efficiency, and is robust to hyper-parameters change.

Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics

TL;DR

This paper tackles sample efficiency in vision-based reinforcement learning for real-robot control by decoupling state representation learning from policy learning. It evaluates several SRL methods on goal-based robotics tasks and introduces SRL Splits, a stacked representation that partitions the latent space to balance multiple learning objectives. The results show that SRL Splits can approach or exceed end-to-end performance with fewer samples and that the method robustly transfers to real robot settings, while random features remain a strong baseline. Overall, the work supports modular SRL approaches for more data-efficient, interpretable, and robust robotic RL.

Abstract

Scaling end-to-end reinforcement learning to control real robots from vision presents a series of challenges, in particular in terms of sample efficiency. Against end-to-end learning, state representation learning can help learn a compact, efficient and relevant representation of states that speeds up policy learning, reducing the number of samples needed, and that is easier to interpret. We evaluate several state representation learning methods on goal based robotics tasks and propose a new unsupervised model that stacks representations and combines strengths of several of these approaches. This method encodes all the relevant features, performs on par or better than end-to-end learning with better sample efficiency, and is robust to hyper-parameters change.

Paper Structure

This paper contains 17 sections, 2 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: SRL Splits model: combines a reconstruction of an image $I$, a reward ($r$) prediction and an inverse dynamic models losses, using two splits of the state representation $s$. Arrows represent model learning and inference, dashed frames represent losses computation, rectangles are state representations, circles are real observed data, and squares are model predictions.
  • Figure 2: Environments for state representation learning from S-RL toolbox Raffin18 with extensions (2D Simulated + Real Omnibot).
  • Figure 3: Performance (mean and standard error for 8 runs) for PPO algorithm for different state representations learned in Simulated OmniRobot with randomly initialized target environment.
  • Figure 4: Performance (mean and standard error for 10 runs) for PPO algorithm for different state representations learned in Navigation 1D target environment.
  • Figure 5: Performance (mean and standard error for 10 runs) for PPO algorithm for different state representations learned in Navigation 2D random target environment.
  • ...and 7 more figures