SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies
Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar
TL;DR
The paper tackles the challenge of zero-shot generalization in visual reinforcement learning by introducing SECANT, a two-stage self-expert cloning framework that first trains a high-performing expert with weak augmentations and then distills its behavior to a student that learns from heavily augmented observations. This decouples policy optimization from robust representation learning, enabling strong generalization to unseen visual environments without test-time rewards or adaptation. Extensive experiments across DMControl, Robosuite, CARLA, and iGibson show consistent improvements over prior SOTA, with notable gains in average rewards, robustness of representations, and faster inference than competing methods. The work provides actionable insights on augmentation strategies, imitation design, and the benefits of sequential two-stage training for robust visual policies.
Abstract
Generalization has been a long-standing challenge for reinforcement learning (RL). Visual RL, in particular, can be easily distracted by irrelevant factors in high-dimensional observation space. In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift. We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization. Specifically, an expert policy is first trained by RL from scratch with weak augmentations. A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert. Extensive experiments demonstrate that SECANT significantly advances the state of the art in zero-shot generalization across 4 challenging domains. Our average reward improvements over prior SOTAs are: DeepMind Control (+26.5%), robotic manipulation (+337.8%), vision-based autonomous driving (+47.7%), and indoor object navigation (+15.8%). Code release and video are available at https://linxifan.github.io/secant-site/.
