Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint

Haitong Ma; Zhaolin Ren; Bo Dai; Na Li

Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint

Haitong Ma, Zhaolin Ren, Bo Dai, Na Li

TL;DR

STEADY tackles the challenge of transferring sim-trained policies to real robots by learning task-agnostic skill representations via spectral decomposition and by discovering new gap-induced skills from real data. It first learns simulator skill sets and a simulator policy, then extracts residual real-world dynamics with orthogonality- constrained discovery to build an enlarged skill space, enabling robust policy synthesis. Experiments on Crazyflie 2.1 quadrotors show STEADY improves real-world hovering, take-off, landing, and trajectory tracking, achieving up to 30.2% gains over baselines. The approach offers a principled, sample-efficient pathway for generalizable sim-to-real transfer in nonlinear robotic systems.

Abstract

We study sim-to-real skill transfer and discovery in the context of robotics control using representation learning. We draw inspiration from spectral decomposition of Markov decision processes. The spectral decomposition brings about representation that can linearly represent the state-action value function induced by any policies, thus can be regarded as skills. The skill representations are transferable across arbitrary tasks with the same transition dynamics. Moreover, to handle the sim-to-real gap in the dynamics, we propose a skill discovery algorithm that learns new skills caused by the sim-to-real gap from real-world data. We promote the discovery of new skills by enforcing orthogonal constraints between the skills to learn and the skills from simulators, and then synthesize the policy using the enlarged skill sets. We demonstrate our methodology by transferring quadrotor controllers from simulators to Crazyflie 2.1 quadrotors. We show that we can learn the skill representations from a single simulator task and transfer these to multiple different real-world tasks including hovering, taking off, landing and trajectory tracking. Our skill discovery approach helps narrow the sim-to-real gap and improve the real-world controller performance by up to 30.2%.

Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint

TL;DR

Abstract

Paper Structure (26 sections, 30 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 26 sections, 30 equations, 7 figures, 1 table, 2 algorithms.

Introduction
Related works
Sim-to-Real Transfer
Domain randomization
Residual dynamics learning
Representation-Based Knowledge Transfer
Representation Learning via Spectral Decomposition
Preliminaries
Notations and Sim-to-Real Problem Setting
Spectral Decomposition and Skills in Markov Decision Processes
Skill Learning by Spectral Conditional Density Estimation
STEADY: Skill Transfer and Discovery for Sim-to-Real Learning
Learning Skill Sets in the Simulator
Skill Discovery from Real-World Data
Policy Synthesis for Real-World Tasks
...and 11 more sections

Figures (7)

Figure 1: Overview of the STEADY framework for sim-to-real learning. More information can be found on our https://congharvard.github.io/steady-sim-to-real/.
Figure 2: Snapshots of taking-off, hovering 7 seconds then landing with the simulator policy and policy improved by . Yellow dash lines indicate the target hovering height (1m). The Crazyflies are highlighted with red boxes. The snapshots are taken every 0.8 seconds. Figure \ref{['subfig:takeoff_simulator']} shows the simulator policy and Figure \ref{['subfig:takeoff_real']} shows the policy learned by the proposed STEADY algorithm.
Figure 3: The experiment environment with Optitrack motion capture system.
Figure 4: Episodic return to training samples during the training process with 5 random seeds. The shaded region implies 95% confidential interval over 10 evaluation episodes for every 2000 samples of the five random seeds.
Figure 5: Trajectories of taking-off, hovering and landing trajectories.
...and 2 more figures

Theorems & Definitions (1)

Definition 1: Spectral decomposition of MDPs, jin_provably_2019agarwal_flambe_2020

Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint

TL;DR

Abstract

Skill Transfer and Discovery for Sim-to-Real Learning: A Representation-Based Viewpoint

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (1)