Table of Contents
Fetching ...

State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning

Yuxiang Liu, Shengfan Cao

TL;DR

This work addresses the challenge of transferring vision-based end-to-end imitation policies across visual domains when target-domain data are off-policy, scarce, and expert-free. It introduces State-Conditional Adversarial Learning (SCAL), an off-policy transfer framework that aligns latent representations conditioned on the system state by estimating a state-conditioned KL divergence via a discriminator. The approach provides a theoretical bound linking target imitation loss to source-domain loss plus latent alignment terms, and demonstrates strong sample efficiency and transfer robustness in BARC–CARLA driving tasks. The results suggest SCAL is a practical, data-efficient method for visual domain transfer in safety-critical settings, with potential for real-world deployment and further theoretical tightening across divergences and domains.

Abstract

We study visual domain transfer for end-to-end imitation learning in a realistic and challenging setting where target-domain data are strictly off-policy, expert-free, and scarce. We first provide a theoretical analysis showing that the target-domain imitation loss can be upper bounded by the source-domain loss plus a state-conditional latent KL divergence between source and target observation models. Guided by this result, we propose State- Conditional Adversarial Learning, an off-policy adversarial framework that aligns latent distributions conditioned on system state using a discriminator-based estimator of the conditional KL term. Experiments on visually diverse autonomous driving environments built on the BARC-CARLA simulator demonstrate that SCAL achieves robust transfer and strong sample efficiency.

State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning

TL;DR

This work addresses the challenge of transferring vision-based end-to-end imitation policies across visual domains when target-domain data are off-policy, scarce, and expert-free. It introduces State-Conditional Adversarial Learning (SCAL), an off-policy transfer framework that aligns latent representations conditioned on the system state by estimating a state-conditioned KL divergence via a discriminator. The approach provides a theoretical bound linking target imitation loss to source-domain loss plus latent alignment terms, and demonstrates strong sample efficiency and transfer robustness in BARC–CARLA driving tasks. The results suggest SCAL is a practical, data-efficient method for visual domain transfer in safety-critical settings, with potential for real-world deployment and further theoretical tightening across divergences and domains.

Abstract

We study visual domain transfer for end-to-end imitation learning in a realistic and challenging setting where target-domain data are strictly off-policy, expert-free, and scarce. We first provide a theoretical analysis showing that the target-domain imitation loss can be upper bounded by the source-domain loss plus a state-conditional latent KL divergence between source and target observation models. Guided by this result, we propose State- Conditional Adversarial Learning, an off-policy adversarial framework that aligns latent distributions conditioned on system state using a discriminator-based estimator of the conditional KL term. Experiments on visually diverse autonomous driving environments built on the BARC-CARLA simulator demonstrate that SCAL achieves robust transfer and strong sample efficiency.

Paper Structure

This paper contains 30 sections, 4 theorems, 42 equations, 6 figures, 1 algorithm.

Key Result

Lemma 4.1

If the policy $\pi_\theta$ has an encoder $E_\phi$ that aligns $e_s$ and $e_t$, then

Figures (6)

  • Figure 1: PCA Visualization of Latent Space with (left) and without(right) using SCAL. The latent vectors presented are sampled from exactly the same path-tracking trajectory.
  • Figure 2: Two Example Domains in our experiments with the same track shape but drastically different visual characters.
  • Figure 3: Correlation between estimated State-Conditional KL divergence and On-policy target domain metric.
  • Figure 4: SCAL compared with perfect baseline under different $\mathcal{B}_s$ distributions. x-axis: Target-domain buffer size. y-axis: Maximum trajectory length achieved in the target domain. SCAL trained with $\mathcal{B}_t$ distribution 1(yellow); SCAL trained with $\mathcal{B}_t$ distribution 2(blue); SCAL trained with $\mathcal{B}_t$ distribution 3(purple). Perfect baseline(Black). The shaded area represents variance.
  • Figure 5: Three different target-domain off-policy sample distributions used in experiment \ref{['Distributional-shift Study']}. The brighter area stands for states sampled with higher frequency. Left(The whole track is randomly sampled); Middle (target-domain samples biasing round the track's starting point); Right (target-domain samples biasing round the track's mid point)
  • ...and 1 more figures

Theorems & Definitions (10)

  • Definition 2.1: Discounted Visitation Distribution (ho2016generativekakade2002approximately)
  • Definition 4.1: Alignment
  • Lemma 4.1
  • proof
  • Theorem 4.1
  • proof
  • Remark
  • Proposition 4.1
  • Proposition 5.1
  • proof