Learning from Demonstration with Implicit Nonlinear Dynamics Models
Peter David Fagan, Subramanian Ramamoorthy
TL;DR
The paper tackles error drift in Learning from Demonstration (LfD) by introducing an Echo State Layer (ESL) that embeds a fixed nonlinear dynamical system, guided by echo-state principles, into neural networks. ESL combines fixed reservoir-like dynamics with learnable input embeddings and a learnable readout, enabling task conditioning while preserving temporal inductive biases. Empirical results on the LASA handwriting dataset show ESL improves precision and robustness to noise, maintains competitive latency, and generalises across multiple dynamics regimes compared to Echo State Networks and temporal ensembling baselines. The work provides a practical RC-inspired layer and an open-source JAX/Flax library, with potential to enhance robot imitation and sequential decision-making tasks.
Abstract
Learning from Demonstration (LfD) is a useful paradigm for training policies that solve tasks involving complex motions, such as those encountered in robotic manipulation. In practice, the successful application of LfD requires overcoming error accumulation during policy execution, i.e. the problem of drift due to errors compounding over time and the consequent out-of-distribution behaviours. Existing works seek to address this problem through scaling data collection, correcting policy errors with a human-in-the-loop, temporally ensembling policy predictions or through learning a dynamical system model with convergence guarantees. In this work, we propose and validate an alternative approach to overcoming this issue. Inspired by reservoir computing, we develop a recurrent neural network layer that includes a fixed nonlinear dynamical system with tunable dynamical properties for modelling temporal dynamics. We validate the efficacy of our neural network layer on the task of reproducing human handwriting motions using the LASA Human Handwriting Dataset. Through empirical experiments we demonstrate that incorporating our layer into existing neural network architectures addresses the issue of compounding errors in LfD. Furthermore, we perform a comparative evaluation against existing approaches including a temporal ensemble of policy predictions and an Echo State Network (ESN) implementation. We find that our approach yields greater policy precision and robustness on the handwriting task while also generalising to multiple dynamics regimes and maintaining competitive latency scores.
