Table of Contents
Fetching ...

Efficient Training of Generalizable Visuomotor Policies via Control-Aware Augmentation

Yinuo Zhao, Kun Wu, Tianjiao Yi, Zhiyuan Xu, Xiaozhu Ju, Zhengping Che, Chi Harold Liu, Jian Tang

TL;DR

EAGLE enhances generalization by applying augmentation only to control-related regions using a self-supervised, control-aware mask, and boosts training efficiency and stability by transferring knowledge from an expert to a student policy, enabling deployment in new environments without further fine-tuning.

Abstract

Improving generalization is one key challenge in embodied AI, where obtaining large-scale datasets across diverse scenarios is costly. Traditional weak augmentations, such as cropping and flipping, are insufficient for improving a model's performance in new environments. Existing data augmentation methods often disrupt task-relevant information in images, potentially degrading performance. To overcome these challenges, we introduce EAGLE, an efficient training framework for generalizable visuomotor policies that improves upon existing methods by (1) enhancing generalization by applying augmentation only to control-related regions identified through a self-supervised control-aware mask and (2) improving training stability and efficiency by distilling knowledge from an expert to a visuomotor student policy, which is then deployed to unseen environments without further fine-tuning. Comprehensive experiments on three domains, including the DMControl Generalization Benchmark, the enhanced Robot Manipulation Distraction Benchmark, and a long-sequential drawer-opening task, validate the effectiveness of our method.

Efficient Training of Generalizable Visuomotor Policies via Control-Aware Augmentation

TL;DR

EAGLE enhances generalization by applying augmentation only to control-related regions using a self-supervised, control-aware mask, and boosts training efficiency and stability by transferring knowledge from an expert to a student policy, enabling deployment in new environments without further fine-tuning.

Abstract

Improving generalization is one key challenge in embodied AI, where obtaining large-scale datasets across diverse scenarios is costly. Traditional weak augmentations, such as cropping and flipping, are insufficient for improving a model's performance in new environments. Existing data augmentation methods often disrupt task-relevant information in images, potentially degrading performance. To overcome these challenges, we introduce EAGLE, an efficient training framework for generalizable visuomotor policies that improves upon existing methods by (1) enhancing generalization by applying augmentation only to control-related regions identified through a self-supervised control-aware mask and (2) improving training stability and efficiency by distilling knowledge from an expert to a visuomotor student policy, which is then deployed to unseen environments without further fine-tuning. Comprehensive experiments on three domains, including the DMControl Generalization Benchmark, the enhanced Robot Manipulation Distraction Benchmark, and a long-sequential drawer-opening task, validate the effectiveness of our method.
Paper Structure (21 sections, 1 theorem, 10 equations, 15 figures, 7 tables, 1 algorithm)

This paper contains 21 sections, 1 theorem, 10 equations, 15 figures, 7 tables, 1 algorithm.

Key Result

Theorem 3.1

Assume that the loss function $\ell$ is Lipschitz continuous with Lipschitz constant $L_\ell$ with respect to its first argument, and bounded by C. Then, with probability at least $1 - \delta$, for all $\theta \in \Theta$: where $\mathcal{L}(\pi_\theta) = \mathbb{E}[\ell(\pi_{\theta}, \pi^*)]$ is the expected of loss of visuomotor policy $\pi_{\theta}$ to expert policy $\pi_e$ and $\hat{\mathcal{

Figures (15)

  • Figure 1: Overview of our method.
  • Figure 2: Control-aware data augmentation module.
  • Figure 3: Observation examples from three benchmarks. Top row: DMC-GB (first: training, second: video hard setting, last two: distraction setting). Medium row: Enhanced Robot Manipulation Distraction Benchmark (first: training, last three: testing). Bottom row: Self-designed Drawer Opening Generalization Benchmark (first: training, second: different backgrounds, last two: different scenarios).
  • Figure 4: Generalization performance of all methods on DMC-GB distraction setting.
  • Figure 5: Illustration of the mask obtained by EAGLE and SAM on Hammer and Push. First column: original observations. Second column: EAGLE's mask. Third column: SAM's mask. Last column: control-aware augmented observations.
  • ...and 10 more figures

Theorems & Definitions (1)

  • Theorem 3.1: Generalization Error for Policy Distillation