Visually Robust Adversarial Imitation Learning from Videos with Contrastive Learning

Vittorio Giammarino; James Queeney; Ioannis Ch. Paschalidis

Visually Robust Adversarial Imitation Learning from Videos with Contrastive Learning

Vittorio Giammarino, James Queeney, Ioannis Ch. Paschalidis

TL;DR

This paper tackles Visual Imitation from Observations under visual mismatch between expert and agent environments by introducing C-LAIfO, an efficient end-to-end method that learns a domain-invariant latent representation through data augmentation and contrastive learning. Imitation is conducted in the latent space with off-policy adversarial learning, using two replay buffers and a discriminator to infer a reward signal and train a policy. The authors provide extensive ablations and demonstrate superior performance over baselines on mismatched visual tasks and on challenging Adroit dexterous manipulation with sparse rewards, highlighting robustness to lighting and background changes. The work also emphasizes the importance of carefully designed augmentations and latent-space training, and it releases open-source code to support reproducibility and further development.

Abstract

We propose C-LAIfO, a computationally efficient algorithm designed for imitation learning from videos in the presence of visual mismatch between agent and expert domains. We analyze the problem of imitation from expert videos with visual discrepancies, and introduce a solution for robust latent space estimation using contrastive learning and data augmentation. Provided a visually robust latent space, our algorithm performs imitation entirely within this space using off-policy adversarial imitation learning. We conduct a thorough ablation study to justify our design and test C-LAIfO on high-dimensional continuous robotic tasks. Additionally, we demonstrate how C-LAIfO can be combined with other reward signals to facilitate learning on a set of challenging hand manipulation tasks with sparse rewards. Our experiments show improved performance compared to baseline methods, highlighting the effectiveness of C-LAIfO. To ensure reproducibility, we open source our code.

Visually Robust Adversarial Imitation Learning from Videos with Contrastive Learning

TL;DR

Abstract

Paper Structure (24 sections, 2 theorems, 10 equations, 14 figures, 5 tables, 1 algorithm)

This paper contains 24 sections, 2 theorems, 10 equations, 14 figures, 5 tables, 1 algorithm.

INTRODUCTION
Related Work
Imitation from observation
Imitation from videos with environment mismatch
End-to-end algorithms for imitation from videos with mismatch
Preliminaries
Partially Observable Markov Decision Process
Reinforcement learning
Generative adversarial imitation learning
Modeling the visual mismatch in POMDPs
Contrastive Latent Adversarial Imitation from Observations
Adversarial imitation in latent space
Critic and encoder training step
Contrastive loss
Experiments
...and 9 more sections

Key Result

Proposition 1

Consider source and target POMDPs respectively defined by the tuples $(\mathcal{S}, \mathcal{A}, \mathcal{X}, \mathcal{T}, \mathcal{U}_T, \mathcal{R}, \rho_0, \gamma)$ and $(\mathcal{S}, \mathcal{A}, \mathcal{X}, \mathcal{T}, \mathcal{U}_S, \mathcal{R}, \rho_0, \gamma)$. Let $\mathcal{X} = (\bar{\ma

Figures (14)

Figure 1: Robotic manipulation task. Current end-to-end methods for imitation from expert videos assume that the expert and the agent operate in the same environment. Consequently, they are unable to handle variations in lighting or background.
Figure 2: Summary of C-LAIfO. In the diagram, black lines indicate shared weights among networks, blue arrows indicate forward pass through the networks, and red arrows indicate backward pass. The losses $\mathcal{L}_{D}$, $\mathcal{L}_Q$ and $\mathcal{L}(z_{\bm{\delta}})$ are respectively in \ref{['eq:AIL_BCE']}, \ref{['eq:Q_regression_regularized']}, and \ref{['eq:contr_loss']}. $\mathcal{L}_{\pi}$ indicates the deterministic actor-critic loss silver2014deterministic.
Figure 3: Different environments used for the experiments in Table \ref{['table_visual_experiments']} and the PCA in Fig. \ref{['fig:walker_PCA_light']} and \ref{['fig:walker_PCA_full']}.
Figure 4: PCA results for the Light experiment in Table \ref{['table_visual_experiments']}.
Figure 5: PCA results on C-LAIfO for the Full experiment in Table \ref{['table_visual_experiments']} and the unseen environment in Fig. \ref{['fig:walker_unseen']}.
...and 9 more figures

Theorems & Definitions (4)

Proposition 1
proof
Proposition 2
proof

Visually Robust Adversarial Imitation Learning from Videos with Contrastive Learning

TL;DR

Abstract

Visually Robust Adversarial Imitation Learning from Videos with Contrastive Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (4)