Table of Contents
Fetching ...

Unsupervised state representation learning with robotic priors: a robustness benchmark

Timothée Lesort, Mathieu Seurin, Xinrui Li, Natalia Díaz-Rodríguez, David Filliat

TL;DR

This work extends unsupervised state representation learning from robotic priors to high-dimensional RGB imagery, enabling 3D hand-position representations from a robot head camera. It implements four established priors as loss terms within a Siamese-network framework and introduces a fifth Reference Point Prior to stabilize cross-sequence representations, including a new evaluation metric (KNN-MSE) alongside NIEQA. The approach achieves coherent, compact state spaces that outperform autoencoders and approach supervised performance on several datasets, while identifying robustness limits under static distractors and domain randomization. The results highlight the practicality of robotic priors for rapid, task-agnostic state learning and point toward real-robot transfer and integration with reward-forward models as future directions.

Abstract

Our understanding of the world depends highly on our capacity to produce intuitive and simplified representations which can be easily used to solve problems. We reproduce this simplification process using a neural network to build a low dimensional state representation of the world from images acquired by a robot. As in Jonschkowski et al. 2015, we learn in an unsupervised way using prior knowledge about the world as loss functions called robotic priors and extend this approach to high dimension richer images to learn a 3D representation of the hand position of a robot from RGB images. We propose a quantitative evaluation of the learned representation using nearest neighbors in the state space that allows to assess its quality and show both the potential and limitations of robotic priors in realistic environments. We augment image size, add distractors and domain randomization, all crucial components to achieve transfer learning to real robots. Finally, we also contribute a new prior to improve the robustness of the representation. The applications of such low dimensional state representation range from easing reinforcement learning (RL) and knowledge transfer across tasks, to facilitating learning from raw data with more efficient and compact high level representations. The results show that the robotic prior approach is able to extract high level representation as the 3D position of an arm and organize it into a compact and coherent space of states in a challenging dataset.

Unsupervised state representation learning with robotic priors: a robustness benchmark

TL;DR

This work extends unsupervised state representation learning from robotic priors to high-dimensional RGB imagery, enabling 3D hand-position representations from a robot head camera. It implements four established priors as loss terms within a Siamese-network framework and introduces a fifth Reference Point Prior to stabilize cross-sequence representations, including a new evaluation metric (KNN-MSE) alongside NIEQA. The approach achieves coherent, compact state spaces that outperform autoencoders and approach supervised performance on several datasets, while identifying robustness limits under static distractors and domain randomization. The results highlight the practicality of robotic priors for rapid, task-agnostic state learning and point toward real-robot transfer and integration with reward-forward models as future directions.

Abstract

Our understanding of the world depends highly on our capacity to produce intuitive and simplified representations which can be easily used to solve problems. We reproduce this simplification process using a neural network to build a low dimensional state representation of the world from images acquired by a robot. As in Jonschkowski et al. 2015, we learn in an unsupervised way using prior knowledge about the world as loss functions called robotic priors and extend this approach to high dimension richer images to learn a 3D representation of the hand position of a robot from RGB images. We propose a quantitative evaluation of the learned representation using nearest neighbors in the state space that allows to assess its quality and show both the potential and limitations of robotic priors in realistic environments. We augment image size, add distractors and domain randomization, all crucial components to achieve transfer learning to real robots. Finally, we also contribute a new prior to improve the robustness of the representation. The applications of such low dimensional state representation range from easing reinforcement learning (RL) and knowledge transfer across tasks, to facilitating learning from raw data with more efficient and compact high level representations. The results show that the robotic prior approach is able to extract high level representation as the 3D position of an arm and organize it into a compact and coherent space of states in a challenging dataset.

Paper Structure

This paper contains 16 sections, 6 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Example of neural net architecture with two Siamese networks and frozen feature extractors (ResNet).
  • Figure 2: Left: Baxter's camera view for Static-Button-Distractors dataset 2. Right: Baxter's left hand position ground truth position and its coded reward
  • Figure 3: A sample of each dataset (1-4), created for our benchmark
  • Figure 4: Learned state space on Static-Button-Distractors (dataset 2): Left: Denoising Autoencoder. Middle: 4 Priors. Right: 5 Priors. A red reward (value +1) state means the button is being pushed, gray (-1) if the hand is out of sight, and blue (reward 0) if hand is elsewhere.
  • Figure 5: Effect of static distractors in dataset 3 on the 4 priors approach learned state space.
  • ...and 2 more figures