Learning optimal controllers: a dynamical motion primitive approach

Hugo T. M. Kussaba; Abdalla Swikir; Fan Wu; Anastasija Demerdjieva; Gitta Kutyniok; Sami Haddadin

Learning optimal controllers: a dynamical motion primitive approach

Hugo T. M. Kussaba, Abdalla Swikir, Fan Wu, Anastasija Demerdjieva, Gitta Kutyniok, Sami Haddadin

TL;DR

The paper tackles real-time optimization for control in robotics where solving multiple OCPs online is computationally prohibitive. It introduces a dynamic movement primitive (DMP) based framework for learning near-optimal controllers from a grid of optimal trajectories and uses a first-order estimate of the value function $V$ and its gradient to bound suboptimality of out-of-sample trajectories. A cost-aware sampling algorithm leverages the backward OCP and the sensitivity term $\partial \tilde{V}/\partial x$ to build a non-uniform grid $\hat{\mathbb P}$, reducing training samples and storage. Numerical demonstrations show accurate suboptimality estimates and substantial sample savings, with potential applications to embedded, resource-constrained real-time controllers.

Abstract

Real-time computation of optimal control is a challenging problem and, to solve this difficulty, many frameworks proposed to use learning techniques to learn (possibly sub-optimal) controllers and enable their usage in an online fashion. Among these techniques, the optimal motion framework is a simple, yet powerful technique, that obtained success in many complex real-world applications. The main idea of this approach is to take advantage of dynamic motion primitives, a widely used tool in robotics to learn trajectories from demonstrations. While usually these demonstrations come from humans, the optimal motion framework is based on demonstrations coming from optimal solutions, such as the ones obtained by numeric solvers. As usual in many learning techniques, a drawback of this approach is that it is hard to estimate the suboptimality of learned solutions, since finding easily computable and non-trivial upper bounds to the error between an optimal solution and a learned solution is, in general, unfeasible. However, we show in this paper that it is possible to estimate this error for a broad class of problems. Furthermore, we apply this estimation technique to achieve a novel and more efficient sampling scheme to be used within the optimal motion framework, enabling the usage of this framework in some scenarios where the computational resources are limited.

Learning optimal controllers: a dynamical motion primitive approach

TL;DR

and its gradient to bound suboptimality of out-of-sample trajectories. A cost-aware sampling algorithm leverages the backward OCP and the sensitivity term

to build a non-uniform grid

, reducing training samples and storage. Numerical demonstrations show accurate suboptimality estimates and substantial sample savings, with potential applications to embedded, resource-constrained real-time controllers.

Abstract

Paper Structure (11 sections, 33 equations, 3 figures)

This paper contains 11 sections, 33 equations, 3 figures.

Introduction
Preliminaries
Optimal control problems
Review of DMP-based optimal motion framework
Dynamic movement primitives
Learning controllers with optimal motion primitives
Estimation of Optimal Value and Sampling Algorithm
First-order approximation to the value function
Sampling algorithm description
Numerical Simulation
Conclusion

Figures (3)

Figure 1: Diagram illustrating the steps for computing an estimate for the cost of an optimal trajectory that corresponds to an out-of sample trajectory with same endpoints.
Figure 2: (a) Application of sampling algorithm to example system \ref{['eq:example_sys1']}. The dashed vertical lines indicate the terminal points wherein a new DMP should be used, while the points between the lines are out-of-sample learned trajectories. (b) Error between the estimated optimal cost and the real value of the optimal cost. (c) Difference between the cost of the DMP trajectory and the real optimal cost.
Figure 3: (a) Grid obtained by proposed Algorithm 1. It uses only $15$ samples. (b) Uniform grid using the minimal distance between sampling points of grid obtained by Algorithm 1. It uses $39$ samples.

Theorems & Definitions (2)

Proof 1
Proof 2

Learning optimal controllers: a dynamical motion primitive approach

TL;DR

Abstract

Learning optimal controllers: a dynamical motion primitive approach

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (2)