Bridging the Sim-to-Real Gap from the Information Bottleneck Perspective

Haoran He; Peilin Wu; Chenjia Bai; Hang Lai; Lingxiao Wang; Ling Pan; Xiaolin Hu; Weinan Zhang

Bridging the Sim-to-Real Gap from the Information Bottleneck Perspective

Haoran He, Peilin Wu, Chenjia Bai, Hang Lai, Lingxiao Wang, Ling Pan, Xiaolin Hu, Weinan Zhang

TL;DR

This paper forms the sim-to-real gap as an information bottleneck problem and proposes a novel privileged knowledge distillation method called the Historical Information Bottleneck (HIB), which learns a privileged knowledge representation from historical trajectories by capturing the underlying changeable dynamic information.

Abstract

Reinforcement Learning (RL) has recently achieved remarkable success in robotic control. However, most works in RL operate in simulated environments where privileged knowledge (e.g., dynamics, surroundings, terrains) is readily available. Conversely, in real-world scenarios, robot agents usually rely solely on local states (e.g., proprioceptive feedback of robot joints) to select actions, leading to a significant sim-to-real gap. Existing methods address this gap by either gradually reducing the reliance on privileged knowledge or performing a two-stage policy imitation. However, we argue that these methods are limited in their ability to fully leverage the available privileged knowledge, resulting in suboptimal performance. In this paper, we formulate the sim-to-real gap as an information bottleneck problem and therefore propose a novel privileged knowledge distillation method called the Historical Information Bottleneck (HIB). In particular, HIB learns a privileged knowledge representation from historical trajectories by capturing the underlying changeable dynamic information. Theoretical analysis shows that the learned privileged knowledge representation helps reduce the value discrepancy between the oracle and learned policies. Empirical experiments on both simulated and real-world tasks demonstrate that HIB yields improved generalizability compared to previous methods. Videos of real-world experiments are available at https://sites.google.com/view/history-ib .

Bridging the Sim-to-Real Gap from the Information Bottleneck Perspective

TL;DR

Abstract

Paper Structure (26 sections, 5 theorems, 41 equations, 11 figures, 9 tables, 1 algorithm)

This paper contains 26 sections, 5 theorems, 41 equations, 11 figures, 9 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Theoretical Analysis & Motivation
Value Discrepancy for Policy Generalization
Privilege Modeling Discrepancy
Methodology
Reducing the Discrepancy via MI
Historical Information Bottleneck
Experiments
Benchmarks and Compared Methods
Simulation Comparison
Visualization and Ablation Study
Real-world Application
Conclusion and Limitation
...and 11 more sections

Key Result

Theorem 1

The value discrepancy between the optimal value function with privileged knowledge and the value function with the local state is bounded as where is the policy divergence between $\pi^*$ and $\hat{\pi}$, and $r_{\rm max}$ is the maximum reward in each step.

Figures (11)

Figure 1: Overall training framework. HIB adopts the IB principle to recover privileged knowledge from a fixed length of local history information. The RL objective also provides gradients to the history encoder $f_\psi$, implying that the learned representation can be combined with any RL algorithm effectively.
Figure 2: A Unitree A1 traversing different terrains.
Figure 3: t-SNE visualization for the privilege representation and the learned latent representation of history encoder in finger spin.
Figure 4: Comparison of different HIB variants in quadruped walk.
Figure 5: HIB successfully handles high stairs. See more examples in our supplementary video.
...and 6 more figures

Theorems & Definitions (10)

Definition 1: Privileged Knowledge
Theorem 1: Policy imitation discrepancy
Theorem 2: Privilege modeling discrepancy
Theorem A.1: General oracle imitation discrepancy bound
proof
Theorem A.2: Policy imitation discrepancy
proof
Theorem A.3: Privilege modeling discrepancy
proof
Remark 1: Explanation of the Bellman Equation

Bridging the Sim-to-Real Gap from the Information Bottleneck Perspective

TL;DR

Abstract

Bridging the Sim-to-Real Gap from the Information Bottleneck Perspective

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (10)