Table of Contents
Fetching ...

Domain Adaptation of Visual Policies with a Single Demonstration

Weiyao Wang, Gregory D. Hager

TL;DR

PromptAdapt tackles the challenging sim-to-real gap in visuomotor control by using a single demonstration as an in-context prompt for a Transformer-based student policy distilled from a state-based teacher. The method combines policy distillation with a DAgger loop under domain randomization, enabling adaptation to target-domain visual shifts without fine-tuning. Empirical results across Distracting CS tasks and UR5 manipulation show improved performance in both in-distribution and out-of-distribution settings, with ablations confirming the value of the demonstration and its trajectory components. The work advances practical domain-adaptive visuomotor policies and highlights the potential of in-context, demonstration-driven adaptation for real-world robotics.

Abstract

Deploying machine learning algorithms for robot tasks in real-world applications presents a core challenge: overcoming the domain gap between the training and the deployment environment. This is particularly difficult for visuomotor policies that utilize high-dimensional images as input, particularly when those images are generated via simulation. A common method to tackle this issue is through domain randomization, which aims to broaden the span of the training distribution to cover the test-time distribution. However, this approach is only effective when the domain randomization encompasses the actual shifts in the test-time distribution. We take a different approach, where we make use of a single demonstration (a prompt) to learn policy that adapts to the testing target environment. Our proposed framework, PromptAdapt, leverages the Transformer architecture's capacity to model sequential data to learn demonstration-conditioned visual policies, allowing for in-context adaptation to a target domain that is distinct from training. Our experiments in both simulation and real-world settings show that PromptAdapt is a strong domain-adapting policy that outperforms baseline methods by a large margin under a range of domain shifts, including variations in lighting, color, texture, and camera pose. Videos and more information can be viewed at project webpage: https://sites.google.com/view/promptadapt.

Domain Adaptation of Visual Policies with a Single Demonstration

TL;DR

PromptAdapt tackles the challenging sim-to-real gap in visuomotor control by using a single demonstration as an in-context prompt for a Transformer-based student policy distilled from a state-based teacher. The method combines policy distillation with a DAgger loop under domain randomization, enabling adaptation to target-domain visual shifts without fine-tuning. Empirical results across Distracting CS tasks and UR5 manipulation show improved performance in both in-distribution and out-of-distribution settings, with ablations confirming the value of the demonstration and its trajectory components. The work advances practical domain-adaptive visuomotor policies and highlights the potential of in-context, demonstration-driven adaptation for real-world robotics.

Abstract

Deploying machine learning algorithms for robot tasks in real-world applications presents a core challenge: overcoming the domain gap between the training and the deployment environment. This is particularly difficult for visuomotor policies that utilize high-dimensional images as input, particularly when those images are generated via simulation. A common method to tackle this issue is through domain randomization, which aims to broaden the span of the training distribution to cover the test-time distribution. However, this approach is only effective when the domain randomization encompasses the actual shifts in the test-time distribution. We take a different approach, where we make use of a single demonstration (a prompt) to learn policy that adapts to the testing target environment. Our proposed framework, PromptAdapt, leverages the Transformer architecture's capacity to model sequential data to learn demonstration-conditioned visual policies, allowing for in-context adaptation to a target domain that is distinct from training. Our experiments in both simulation and real-world settings show that PromptAdapt is a strong domain-adapting policy that outperforms baseline methods by a large margin under a range of domain shifts, including variations in lighting, color, texture, and camera pose. Videos and more information can be viewed at project webpage: https://sites.google.com/view/promptadapt.
Paper Structure (8 sections, 3 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 8 sections, 3 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Left: We first train a teacher policy using privileged ground truth state information. Right: We then distill the learned policy to a visual input-only student policy through imitation learning. For the student policy, our PromptAdapt architecture leverages a Transformer network to condition on a single demonstration, efficiently adapting to the target domain during testing. Same domain randomization function is applied to images in demonstration and per step observation to enable demonstration conditioned adaptation to visual appearance changes.
  • Figure 2: Sample observations in simulated environments. In-distribution image sample is shown left in each pair and out-of-distribution image sample is shown right. First row: cartpole swingup, ball_in_cup catch; Second row: finger spin and walker walk; Third row: UR5 reach and UR5 push.
  • Figure 3: Ablation study in UR5 simulation environments. Our method leads to lower decrease in success for all out of distribution factors including lighting, object color, table texture and camera pose. Error bar indicates one standard deviation.
  • Figure 4: Ablation study in Distracting CS over different variants in demonstration.
  • Figure 5: Left: Real-world observations for UR5 reach and UR5 push. Right: Experiment setup.