Table of Contents
Fetching ...

State Alignment-based Imitation Learning

Fangchen Liu, Zhan Ling, Tongzhou Mu, Hao Su

TL;DR

The paper tackles imitation learning when expert and imitator have mismatched dynamics by introducing SAIL, a state alignment-based framework. SAIL combines local state prediction via a β-VAE with global state-distribution alignment through Wasserstein distance, integrated into a regularized PPO objective and preceded by a pre-training stage. Empirical results across MuJoCo tasks show SAIL outperforms or matches baselines, especially under dynamics mismatch and with limited demonstrations, while ablations confirm the value of each component. The approach offers a practical pathway for robust imitation across diverse actuators and conditions, broadening the applicability of demonstration-based learning.

Abstract

Consider an imitation learning problem that the imitator and the expert have different dynamics models. Most of the current imitation learning methods fail because they focus on imitating actions. We propose a novel state alignment-based imitation learning method to train the imitator to follow the state sequences in expert demonstrations as much as possible. The state alignment comes from both local and global perspectives and we combine them into a reinforcement learning framework by a regularized policy update objective. We show the superiority of our method on standard imitation learning settings and imitation learning settings where the expert and imitator have different dynamics models.

State Alignment-based Imitation Learning

TL;DR

The paper tackles imitation learning when expert and imitator have mismatched dynamics by introducing SAIL, a state alignment-based framework. SAIL combines local state prediction via a β-VAE with global state-distribution alignment through Wasserstein distance, integrated into a regularized PPO objective and preceded by a pre-training stage. Empirical results across MuJoCo tasks show SAIL outperforms or matches baselines, especially under dynamics mismatch and with limited demonstrations, while ablations confirm the value of each component. The approach offers a practical pathway for robust imitation across diverse actuators and conditions, broadening the applicability of demonstration-based learning.

Abstract

Consider an imitation learning problem that the imitator and the expert have different dynamics models. Most of the current imitation learning methods fail because they focus on imitating actions. We propose a novel state alignment-based imitation learning method to train the imitator to follow the state sequences in expert demonstrations as much as possible. The state alignment comes from both local and global perspectives and we combine them into a reinforcement learning framework by a regularized policy update objective. We show the superiority of our method on standard imitation learning settings and imitation learning settings where the expert and imitator have different dynamics models.

Paper Structure

This paper contains 25 sections, 11 equations, 6 figures, 12 tables, 1 algorithm.

Figures (6)

  • Figure 1: Using VAE as a state predictive model will be more self-correctable because of the stochastic sampling mechanism. But this won't happen when we use VAE to predict actions.
  • Figure 2: Visualization of state alignment
  • Figure 3: Comparison with BC, GAIL and AIRL when dynamics are different from experts.
  • Figure 4: Imitation Learning of Actors with Heterogeneous Action Dynamics.
  • Figure 5: (a), (b) show the effects of Wasserstein distance and KL regularization on HalfCheetah-v2 and Humanoid-v2 given 20 demonstration trajectories. And (c) presents the result on Antmaze.
  • ...and 1 more figures