Table of Contents
Fetching ...

Scalable and Efficient Continual Learning from Demonstration via a Hypernetwork-generated Stable Dynamics Model

Sayantan Auddy, Jakob Hollenstein, Matteo Saveriano, Antonio Rodríguez-Sánchez, Justus Piater

TL;DR

The paper tackles the dual challenge of stability and memory in continual learning from demonstration for robotics. It introduces a time-dependent stable NODE (sNODE) whose parameters are generated by a hypernetwork, producing both a nominal dynamics model and a Lyapunov function to guarantee convergence. The authors show that stability not only ensures safe trajectories but also boosts continual learning performance, especially when using compact chunked hypernetworks, and they further reduce training cost via a stochastic regularization strategy that scales as $O(N)$. Empirical results on LASA 2D and high-dimensional variants, plus RoboTasks9, demonstrate strong trajectory accuracy, robust forgetting resistance, and favorable model efficiency, with open-source code provided for reproducibility.

Abstract

Learning from demonstration (LfD) provides an efficient way to train robots. The learned motions should be convergent and stable, but to be truly effective in the real world, LfD-capable robots should also be able to remember multiple motion skills. Existing stable-LfD approaches lack the capability of multi-skill retention. Although recent work on continual-LfD has shown that hypernetwork-generated neural ordinary differential equation solvers (NODE) can learn multiple LfD tasks sequentially, this approach lacks stability guarantees. We propose an approach for stable continual-LfD in which a hypernetwork generates two networks: a trajectory learning dynamics model, and a trajectory stabilizing Lyapunov function. The introduction of stability generates convergent trajectories, but more importantly it also greatly improves continual learning performance, especially in the size-efficient chunked hypernetworks. With our approach, a single hypernetwork learns stable trajectories of the robot's end-effector position and orientation simultaneously, and does so continually for a sequence of real-world LfD tasks without retraining on past demonstrations. We also propose stochastic hypernetwork regularization with a single randomly sampled regularization term, which reduces the cumulative training time cost for N tasks from O$(N^2)$ to O$(N)$ without any loss in performance on real-world tasks. We empirically evaluate our approach on the popular LASA dataset, on high-dimensional extensions of LASA (including up to 32 dimensions) to assess scalability, and on a novel extended robotic task dataset (RoboTasks9) to assess real-world performance. In trajectory error metrics, stability metrics and continual learning metrics our approach performs favorably, compared to other baselines. Our open-source code and datasets are available at https://github.com/sayantanauddy/clfd-snode.

Scalable and Efficient Continual Learning from Demonstration via a Hypernetwork-generated Stable Dynamics Model

TL;DR

The paper tackles the dual challenge of stability and memory in continual learning from demonstration for robotics. It introduces a time-dependent stable NODE (sNODE) whose parameters are generated by a hypernetwork, producing both a nominal dynamics model and a Lyapunov function to guarantee convergence. The authors show that stability not only ensures safe trajectories but also boosts continual learning performance, especially when using compact chunked hypernetworks, and they further reduce training cost via a stochastic regularization strategy that scales as . Empirical results on LASA 2D and high-dimensional variants, plus RoboTasks9, demonstrate strong trajectory accuracy, robust forgetting resistance, and favorable model efficiency, with open-source code provided for reproducibility.

Abstract

Learning from demonstration (LfD) provides an efficient way to train robots. The learned motions should be convergent and stable, but to be truly effective in the real world, LfD-capable robots should also be able to remember multiple motion skills. Existing stable-LfD approaches lack the capability of multi-skill retention. Although recent work on continual-LfD has shown that hypernetwork-generated neural ordinary differential equation solvers (NODE) can learn multiple LfD tasks sequentially, this approach lacks stability guarantees. We propose an approach for stable continual-LfD in which a hypernetwork generates two networks: a trajectory learning dynamics model, and a trajectory stabilizing Lyapunov function. The introduction of stability generates convergent trajectories, but more importantly it also greatly improves continual learning performance, especially in the size-efficient chunked hypernetworks. With our approach, a single hypernetwork learns stable trajectories of the robot's end-effector position and orientation simultaneously, and does so continually for a sequence of real-world LfD tasks without retraining on past demonstrations. We also propose stochastic hypernetwork regularization with a single randomly sampled regularization term, which reduces the cumulative training time cost for N tasks from O to O without any loss in performance on real-world tasks. We empirically evaluate our approach on the popular LASA dataset, on high-dimensional extensions of LASA (including up to 32 dimensions) to assess scalability, and on a novel extended robotic task dataset (RoboTasks9) to assess real-world performance. In trajectory error metrics, stability metrics and continual learning metrics our approach performs favorably, compared to other baselines. Our open-source code and datasets are available at https://github.com/sayantanauddy/clfd-snode.
Paper Structure (36 sections, 14 equations, 28 figures, 7 tables)

This paper contains 36 sections, 14 equations, 28 figures, 7 tables.

Figures (28)

  • Figure 1: Overview of key results and our proposed approach. (a) Continual learning from demonstration with stable NODEs generated by a chunked hypernetwork (CHN$\rightarrow$$\mathit{s}$NODE) outperforms NODE-based continual learning (CHN$\rightarrow$NODE) by a wide margin (details in Sec. \ref{['sec:experiments_results']}). (b) Stochastic regularization with a single regularization term (CHN-1) leads to a CHN$\rightarrow$$\mathit{s}$NODE model that performs as well as the fully regularized model (CHN-all) on real-world tasks but reduces the training cost of $N$ tasks from $\mathcal{O}(N^2)$ to $\mathcal{O}(N)$ (details in Sec. \ref{['sec:experiments_results']}). (c) Architecture of a CHN$\rightarrow$$\mathit{s}$NODE model: a chunked hypernetwork (CHN) $\hat{\mathbf{f}}_\mathbf{h}$ generates the parameters $\upphi=\{\uptheta, \upgamma\}$ of a stable NODE ($\mathit{s}$NODE) $\hat{{\mathbf{f}}}_\upphi$, comprising a nominal dynamics model $\hat{\mathbf{f}}_\uptheta$ and a Lyapunov function $V_\upgamma$. Task-specific learned parameters are shown in , regularized (task-independent) learned parameters are shown in , and non-trainable inputs/outputs are shown in (details in Sec. \ref{['sec:method']}). (d) Illustrations of the nine real-world tasks of our proposed RoboTasks9 dataset. The last 5 tasks are introduced in this paper (details in Sec. \ref{['sec:experiment_setup']}). With our proposed approach, all tasks can be learned in a continual manner with a single hypernetwork model without retraining on past demonstrations, with minimal forgetting, and with stability in the predicted trajectories.
  • Figure 2: Time dependent stable NODE ($\mathit{s}$NODE) architecture: time input is added to the sNODE resulting in more accurate predictions (changes are shown in purple.).
  • Figure 3: Trajectories of position and orientation are learned simultaneously by projecting the orientation quaternions into rotation vectors using the Log map, learning trajectories of the positions and rotation vectors in Euclidean space, and then projecting the predicted rotation vectors back into quaternions with the help of the Exp map.
  • Figure 4: (a) Architecture of the HN$\rightarrow$$\mathit{s}$NODE model. Parameters $\uptheta$ and $\gamma$ of the nominal dynamics model $\hat{\mathbf{f}}_{\uptheta}$ and the Lyapunov function $V_\gamma$, respectively, of the $\mathit{s}$NODE are generated by the final layer of the Hypernetwork. (b) Architecture of the CHN$\rightarrow$$\mathit{s}$NODE model. Parameters $\uptheta$ and $\gamma$ of the $\mathit{s}$NODE are generated in chunks by the Chunked Hypernetwork, allowing for a smaller hypernetwork size. For (a) and (b), the architecture of the $\mathit{s}$NODE is the same as in Fig. \ref{['fig:clfd_snode_plus_t']} and is shown with muted colors here. Parameters that are learned and are task-specific are shown in , regularized (task-independent) learned parameters are shown in , and non-trainable inputs/outputs are shown in . Contrary to a stand-alone $\mathit{s}$NODE, the parameters of the $\mathit{s}$NODE are not directly trainable, but are simply the outputs of the hypernetworks.
  • Figure 5: DTW errors (lower is better) of all predictions while learning the LASA 2D tasks. The bottom row shows a zoomed-in view of the top plot, and the dashed gray line is a reference for comparing the scales of the two plots. Solid boxes depict $\mathit{s}$NODE and hatched boxes depict NODE. With $\mathit{s}$NODE as the task learner, HN and CHN outperform regularization based methods (SI , MAS ), and perform on par with the upper baselines SG and REP . For SG, there is no perceivable difference between NODE and $\mathit{s}$NODE , but $\mathit{s}$NODE improves the continual learning performance of REP ( vs ), HN ( vs ) and most considerably that of the smallest model CHN ( vs ). Results shown are obtained with 5 independent seeds.
  • ...and 23 more figures