State Alignment-based Imitation Learning
Fangchen Liu, Zhan Ling, Tongzhou Mu, Hao Su
TL;DR
The paper tackles imitation learning when expert and imitator have mismatched dynamics by introducing SAIL, a state alignment-based framework. SAIL combines local state prediction via a β-VAE with global state-distribution alignment through Wasserstein distance, integrated into a regularized PPO objective and preceded by a pre-training stage. Empirical results across MuJoCo tasks show SAIL outperforms or matches baselines, especially under dynamics mismatch and with limited demonstrations, while ablations confirm the value of each component. The approach offers a practical pathway for robust imitation across diverse actuators and conditions, broadening the applicability of demonstration-based learning.
Abstract
Consider an imitation learning problem that the imitator and the expert have different dynamics models. Most of the current imitation learning methods fail because they focus on imitating actions. We propose a novel state alignment-based imitation learning method to train the imitator to follow the state sequences in expert demonstrations as much as possible. The state alignment comes from both local and global perspectives and we combine them into a reinforcement learning framework by a regularized policy update objective. We show the superiority of our method on standard imitation learning settings and imitation learning settings where the expert and imitator have different dynamics models.
