Explorative Imitation Learning: A Path Signature Approach for Continuous Environments

Nathan Gavenski; Juarez Monteiro; Felipe Meneguzzi; Michael Luck; Odinaldo Rodrigues

Explorative Imitation Learning: A Path Signature Approach for Continuous Environments

Nathan Gavenski, Juarez Monteiro, Felipe Meneguzzi, Michael Luck, Odinaldo Rodrigues

TL;DR

The paper tackles the scalability gap in imitation learning by proposing Continuous Imitation Learning from Observation (CILO), which integrates an inverse dynamic model, a policy, and a discriminator to learn from observations with minimal expert data. It leverages exploration to diversify state transitions and path signatures to encode trajectory constraints, using a discriminator to selectively augment self-labelled samples $I^s$. Across five continuous-control tasks, CILO achieves the best overall performance, sometimes surpassing the expert, while maintaining strong sample efficiency and reduced manual intervention. The approach promises practical impact by enabling robust imitation in complex environments with limited expert trajectories, and it opens avenues for integrating discriminative signals and alternative exploration strategies in imitation learning from observations.

Abstract

Some imitation learning methods combine behavioural cloning with self-supervision to infer actions from state pairs. However, most rely on a large number of expert trajectories to increase generalisation and human intervention to capture key aspects of the problem, such as domain constraints. In this paper, we propose Continuous Imitation Learning from Observation (CILO), a new method augmenting imitation learning with two important features: (i) exploration, allowing for more diverse state transitions, requiring less expert trajectories and resulting in fewer training iterations; and (ii) path signatures, allowing for automatic encoding of constraints, through the creation of non-parametric representations of agents and expert trajectories. We compared CILO with a baseline and two leading imitation learning methods in five environments. It had the best overall performance of all methods in all environments, outperforming the expert in two of them.

Explorative Imitation Learning: A Path Signature Approach for Continuous Environments

TL;DR

. Across five continuous-control tasks, CILO achieves the best overall performance, sometimes surpassing the expert, while maintaining strong sample efficiency and reduced manual intervention. The approach promises practical impact by enabling robust imitation in complex environments with limited expert trajectories, and it opens avenues for integrating discriminative signals and alternative exploration strategies in imitation learning from observations.

Abstract

Paper Structure (27 sections, 15 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 27 sections, 15 equations, 6 figures, 5 tables, 1 algorithm.

Problem Formulation
Continuous Imitation Learning from Observation
Exploration
Goal-aware function
Sample efficiency
Experimental Results
Implementation and Metrics
Results
Discussion
Sample Efficiency
Ground-truth error over time
Signature approximation over time
Effects of Gaussian exploration
$I^s$ size over time
Related Work
...and 12 more sections

Figures (6)

Figure 1: CILO's training cycle.
Figure 2: (a) and (b) show ground-truth error for $\mathcal{M}$ and $\mathcal{P}$. (c) shows the normalised difference between $\pi_e$, $\pi_\theta$ and random signatures: $0$ is equivalent to expert, and $\geqslant 1$ means equal or worse than random policy signature.
Figure 3: Distribution of expert actions for Ant and HalfCheetah environments.
Figure 4: Size of $I^s \times$ epochs for all environments.
Figure 5: A single frame for each environment used in this work.
...and 1 more figures

Theorems & Definitions (2)

Definition 1
Definition 2

Explorative Imitation Learning: A Path Signature Approach for Continuous Environments

TL;DR

Abstract

Explorative Imitation Learning: A Path Signature Approach for Continuous Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (2)