Table of Contents
Fetching ...

Learning from Demonstration with Implicit Nonlinear Dynamics Models

Peter David Fagan, Subramanian Ramamoorthy

TL;DR

The paper tackles error drift in Learning from Demonstration (LfD) by introducing an Echo State Layer (ESL) that embeds a fixed nonlinear dynamical system, guided by echo-state principles, into neural networks. ESL combines fixed reservoir-like dynamics with learnable input embeddings and a learnable readout, enabling task conditioning while preserving temporal inductive biases. Empirical results on the LASA handwriting dataset show ESL improves precision and robustness to noise, maintains competitive latency, and generalises across multiple dynamics regimes compared to Echo State Networks and temporal ensembling baselines. The work provides a practical RC-inspired layer and an open-source JAX/Flax library, with potential to enhance robot imitation and sequential decision-making tasks.

Abstract

Learning from Demonstration (LfD) is a useful paradigm for training policies that solve tasks involving complex motions, such as those encountered in robotic manipulation. In practice, the successful application of LfD requires overcoming error accumulation during policy execution, i.e. the problem of drift due to errors compounding over time and the consequent out-of-distribution behaviours. Existing works seek to address this problem through scaling data collection, correcting policy errors with a human-in-the-loop, temporally ensembling policy predictions or through learning a dynamical system model with convergence guarantees. In this work, we propose and validate an alternative approach to overcoming this issue. Inspired by reservoir computing, we develop a recurrent neural network layer that includes a fixed nonlinear dynamical system with tunable dynamical properties for modelling temporal dynamics. We validate the efficacy of our neural network layer on the task of reproducing human handwriting motions using the LASA Human Handwriting Dataset. Through empirical experiments we demonstrate that incorporating our layer into existing neural network architectures addresses the issue of compounding errors in LfD. Furthermore, we perform a comparative evaluation against existing approaches including a temporal ensemble of policy predictions and an Echo State Network (ESN) implementation. We find that our approach yields greater policy precision and robustness on the handwriting task while also generalising to multiple dynamics regimes and maintaining competitive latency scores.

Learning from Demonstration with Implicit Nonlinear Dynamics Models

TL;DR

The paper tackles error drift in Learning from Demonstration (LfD) by introducing an Echo State Layer (ESL) that embeds a fixed nonlinear dynamical system, guided by echo-state principles, into neural networks. ESL combines fixed reservoir-like dynamics with learnable input embeddings and a learnable readout, enabling task conditioning while preserving temporal inductive biases. Empirical results on the LASA handwriting dataset show ESL improves precision and robustness to noise, maintains competitive latency, and generalises across multiple dynamics regimes compared to Echo State Networks and temporal ensembling baselines. The work provides a practical RC-inspired layer and an open-source JAX/Flax library, with potential to enhance robot imitation and sequential decision-making tasks.

Abstract

Learning from Demonstration (LfD) is a useful paradigm for training policies that solve tasks involving complex motions, such as those encountered in robotic manipulation. In practice, the successful application of LfD requires overcoming error accumulation during policy execution, i.e. the problem of drift due to errors compounding over time and the consequent out-of-distribution behaviours. Existing works seek to address this problem through scaling data collection, correcting policy errors with a human-in-the-loop, temporally ensembling policy predictions or through learning a dynamical system model with convergence guarantees. In this work, we propose and validate an alternative approach to overcoming this issue. Inspired by reservoir computing, we develop a recurrent neural network layer that includes a fixed nonlinear dynamical system with tunable dynamical properties for modelling temporal dynamics. We validate the efficacy of our neural network layer on the task of reproducing human handwriting motions using the LASA Human Handwriting Dataset. Through empirical experiments we demonstrate that incorporating our layer into existing neural network architectures addresses the issue of compounding errors in LfD. Furthermore, we perform a comparative evaluation against existing approaches including a temporal ensemble of policy predictions and an Echo State Network (ESN) implementation. We find that our approach yields greater policy precision and robustness on the handwriting task while also generalising to multiple dynamics regimes and maintaining competitive latency scores.
Paper Structure (45 sections, 16 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 45 sections, 16 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Top: A high-level overview of our architecture for the multi-task human handwriting problem. The current pen coordinates represented by $[u_{1,t}, u_{2, t}]$ are mapped to a learnable and non-learnable embeddings using an MLP and fixed linear map $\textbf{W}_{\text{in}}$ respectively. An image of the character to be drawn is also mapped to a learnt embedding using a ResNet layer. All learnt embeddings are concatenated and passed through a sequence of self-attention blocks. The resulting embeddings are added to the non-learnable embedding of the pen state and then passed as input to the implicit nonlinear dynamics model which generates a new dynamical system state which is mapped to predicted pen coordinates. Bottom: Visualisation of a predicted trajectory for the character "S".
  • Figure 2: Our dynamics layer adds embeddings that result from fixed and learnable transformations respectively before passing them to the computational graph representing our dynamical system. Through the discrete-time forward dynamics defined in Eqn \ref{['eqn:esn_dynamics']} we generate the next state of the dynamical system which is used to predict actions.
  • Figure 3: All plots are for results on the "S" character drawing tasks, similar results are observed for other tasks. (a-d) Expert demonstration and predicted drawing trajectories for all models. We include the plots of the area between expert demonstration and predicted trajectories to highlight how well each trajectory aligns with the expert demonstration. (e) The Fréchet distance when evaluated for varying levels of random noise, here noise is sampled from a unit Gaussian and scaled according to the noise scale parameter. (f) Boxplots of the mean absolute jerk of trajectories generated by the various models. (g) Absolute Euclidean distance calculated on dynamically time warped predicted and demonstration trajectories. In this plot, we evaluate the alignment of velocity dynamics for varying levels of temporal ensembling and compare it with our method and the ESN baseline.
  • Figure 4: Illustration of taking a trajectory of samples and subsampling it into discrete subsequences (length 3 in the given example) and finally stacking the resulting sequences to create a dataset of inputs and targets.
  • Figure 5: Overlays of an expert demonstration and the trajectory produced by the trained model across the "S", "C" and "L" character drawing tasks for each model.
  • ...and 4 more figures