Table of Contents
Fetching ...

Noise-Guided Transport for Imitation Learning

Lionel Blondé, Joao A. Candido Ramos, Alexandros Kalousis

TL;DR

Noise-Guided Transport (NGT) tackles imitation learning under ultra-low data by learning a reward signal through an OT-grounded adversarial objective. It uses a predictor f_ξ and a frozen random-prior f†_ξ to form a 1-Lipschitz potential h_ξ and defines the reward r_ξ=exp(-h_ξ), with the objective L(ξ)=E_{expert}[h_ξ]−E_{agent}[h_ξ] connected to the Earth Mover's Distance between expert and agent distributions. The method enforces Lipschitz continuity with spectral normalization and orthogonal initialization, and leverages distributional HL-Gaussian losses to stabilize training in high-dimensional tasks, achieving expert performance with as few as 20 transitions, even in state-only settings. Theoretical results include a concentration bound for the empirical loss and a Lipschitz analysis of HL-Gaussian, guiding hyperparameters such as the smoothing scale σ and the bin count N. Overall, NGT presents a lightweight, off-policy, pretraining-free IL approach with strong data-efficiency, competitive runtime, and applicability to healthcare and biorobotics where demonstrations are scarce.

Abstract

We consider imitation learning in the low-data regime, where only a limited number of expert demonstrations are available. In this setting, methods that rely on large-scale pretraining or high-capacity architectures can be difficult to apply, and efficiency with respect to demonstration data becomes critical. We introduce Noise-Guided Transport (NGT), a lightweight off-policy method that casts imitation as an optimal transport problem solved via adversarial training. NGT requires no pretraining or specialized architectures, incorporates uncertainty estimation by design, and is easy to implement and tune. Despite its simplicity, NGT achieves strong performance on challenging continuous control tasks, including high-dimensional Humanoid tasks, under ultra-low data regimes with as few as 20 transitions. Code is publicly available at: https://github.com/lionelblonde/ngt-pytorch.

Noise-Guided Transport for Imitation Learning

TL;DR

Noise-Guided Transport (NGT) tackles imitation learning under ultra-low data by learning a reward signal through an OT-grounded adversarial objective. It uses a predictor f_ξ and a frozen random-prior f†_ξ to form a 1-Lipschitz potential h_ξ and defines the reward r_ξ=exp(-h_ξ), with the objective L(ξ)=E_{expert}[h_ξ]−E_{agent}[h_ξ] connected to the Earth Mover's Distance between expert and agent distributions. The method enforces Lipschitz continuity with spectral normalization and orthogonal initialization, and leverages distributional HL-Gaussian losses to stabilize training in high-dimensional tasks, achieving expert performance with as few as 20 transitions, even in state-only settings. Theoretical results include a concentration bound for the empirical loss and a Lipschitz analysis of HL-Gaussian, guiding hyperparameters such as the smoothing scale σ and the bin count N. Overall, NGT presents a lightweight, off-policy, pretraining-free IL approach with strong data-efficiency, competitive runtime, and applicability to healthcare and biorobotics where demonstrations are scarce.

Abstract

We consider imitation learning in the low-data regime, where only a limited number of expert demonstrations are available. In this setting, methods that rely on large-scale pretraining or high-capacity architectures can be difficult to apply, and efficiency with respect to demonstration data becomes critical. We introduce Noise-Guided Transport (NGT), a lightweight off-policy method that casts imitation as an optimal transport problem solved via adversarial training. NGT requires no pretraining or specialized architectures, incorporates uncertainty estimation by design, and is easy to implement and tune. Despite its simplicity, NGT achieves strong performance on challenging continuous control tasks, including high-dimensional Humanoid tasks, under ultra-low data regimes with as few as 20 transitions. Code is publicly available at: https://github.com/lionelblonde/ngt-pytorch.

Paper Structure

This paper contains 31 sections, 6 theorems, 27 equations, 9 figures, 3 tables, 1 algorithm.

Key Result

Theorem 4.1

Let $\Lambda(\cdot)$ denote the Lipschitz constant of a given function. By construction, $h_{\xi}$ is $\Lambda(h_{\xi})$-Lipschitz continuous w.r.t. a ground metric $d$ over $\mathbb{X}$ with, $\forall x_1, x_2 \in \mathbb{X}$: as Lipschitz constant.

Figures (9)

  • Figure 1: Performance of a subset of methods, aggregated over tasks and number of demonstrations. Humanoids not included.
  • Figure 2: Performance comparison over various environments and numbers of demonstrations.
  • Figure 3: NGT's unnormalized performance across varying numbers of demonstrations and subsampling rates, in the state-action (first row) and state-state setting (second row), in Humanoid-v4.
  • Figure 4: Comparison of the histogram loss $\ell_{\operatorname{HLG}}$ and a Mean Squared Error (MSE) Softmax loss in NGT on the Humanoid environment. The MSE variant fails to yield meaningful learning, while $\ell_{\operatorname{HLG}}$ enables successful and stable training.
  • Figure 5: Performance of the histogram loss $\ell_{\operatorname{HLG}}$ on non-Humanoid environments. Optimal results are recovered when adapting hyper-parameters such as support width, number of bins, and smoothing factor $\sigma$, highlighting the need for environment-specific scaling.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Theorem 4.1: Lipschitz constant of $h_{\xi}$
  • Theorem 4.2: Lipschitz continuity of $\ell_{\operatorname{HLG}}$
  • Theorem F.2: Concentration bound for the reward loss
  • proof
  • Corollary F.3: For $h \in H^{1}$
  • proof
  • Definition H.1: Groundwork
  • Theorem H.2: Lipschitz constant of $\ell_{\operatorname{HLG}}$ ($p_{\operatorname{max}}$ version)
  • proof
  • Lemma H.3: Maximum probability mass $p_{\operatorname{max}}$
  • ...and 1 more