Table of Contents
Fetching ...

Domain-Invariant Per-Frame Feature Extraction for Cross-Domain Imitation Learning with Visual Observations

Minung Kim, Kawon Lee, Jungmo Kim, Sungho Choi, Seungyul Han

TL;DR

The paper tackles cross-domain imitation learning with high-dimensional visual observations by introducing DIFF-IL, which combines domain-invariant per-frame feature extraction with frame-wise time labeling and adversarial sequence-level alignment. It defines a structured methodology using a shared encoder, domain-specific decoders, and Wasserstein GANs to remove domain-specific artifacts while preserving task-relevant cues, augmented by frame and sequence labeling to shape rewards. Empirical results across Pendulum and MuJoCo tasks show DIFF-IL achieves superior domain transfer, faster convergence, and robust imitation, with ablations confirming the importance of frame-level labeling and balanced per-frame/sequence mapping. The approach enables more reliable vision-based cross-domain imitation and has implications for sim-to-real transfer and robust autonomous control in visually diverse settings.

Abstract

Imitation learning (IL) enables agents to mimic expert behavior without reward signals but faces challenges in cross-domain scenarios with high-dimensional, noisy, and incomplete visual observations. To address this, we propose Domain-Invariant Per-Frame Feature Extraction for Imitation Learning (DIFF-IL), a novel IL method that extracts domain-invariant features from individual frames and adapts them into sequences to isolate and replicate expert behaviors. We also introduce a frame-wise time labeling technique to segment expert behaviors by timesteps and assign rewards aligned with temporal contexts, enhancing task performance. Experiments across diverse visual environments demonstrate the effectiveness of DIFF-IL in addressing complex visual tasks.

Domain-Invariant Per-Frame Feature Extraction for Cross-Domain Imitation Learning with Visual Observations

TL;DR

The paper tackles cross-domain imitation learning with high-dimensional visual observations by introducing DIFF-IL, which combines domain-invariant per-frame feature extraction with frame-wise time labeling and adversarial sequence-level alignment. It defines a structured methodology using a shared encoder, domain-specific decoders, and Wasserstein GANs to remove domain-specific artifacts while preserving task-relevant cues, augmented by frame and sequence labeling to shape rewards. Empirical results across Pendulum and MuJoCo tasks show DIFF-IL achieves superior domain transfer, faster convergence, and robust imitation, with ablations confirming the importance of frame-level labeling and balanced per-frame/sequence mapping. The approach enables more reliable vision-based cross-domain imitation and has implications for sim-to-real transfer and robust autonomous control in visually diverse settings.

Abstract

Imitation learning (IL) enables agents to mimic expert behavior without reward signals but faces challenges in cross-domain scenarios with high-dimensional, noisy, and incomplete visual observations. To address this, we propose Domain-Invariant Per-Frame Feature Extraction for Imitation Learning (DIFF-IL), a novel IL method that extracts domain-invariant features from individual frames and adapts them into sequences to isolate and replicate expert behaviors. We also introduce a frame-wise time labeling technique to segment expert behaviors by timesteps and assign rewards aligned with temporal contexts, enhancing task performance. Experiments across diverse visual environments demonstrate the effectiveness of DIFF-IL in addressing complex visual tasks.

Paper Structure

This paper contains 38 sections, 16 equations, 27 figures, 6 tables, 1 algorithm.

Figures (27)

  • Figure 1: t-SNE visualization of features extracted from: (a) Existing sequence-based IL methods, (b) DIFF-IL (ours).
  • Figure 2: Image mappings of DIFF based on aligned latent features in (a) Reacher, (b) Pendulum, and (c) MuJoCo tasks.
  • Figure 3: Illustration of frame-wise time labeling
  • Figure 4: Structure of the proposed DIFF-IL
  • Figure 5: Pendulum environments
  • ...and 22 more figures