Learning optimal controllers: a dynamical motion primitive approach
Hugo T. M. Kussaba, Abdalla Swikir, Fan Wu, Anastasija Demerdjieva, Gitta Kutyniok, Sami Haddadin
TL;DR
The paper tackles real-time optimization for control in robotics where solving multiple OCPs online is computationally prohibitive. It introduces a dynamic movement primitive (DMP) based framework for learning near-optimal controllers from a grid of optimal trajectories and uses a first-order estimate of the value function $V$ and its gradient to bound suboptimality of out-of-sample trajectories. A cost-aware sampling algorithm leverages the backward OCP and the sensitivity term $\partial \tilde{V}/\partial x$ to build a non-uniform grid $\hat{\mathbb P}$, reducing training samples and storage. Numerical demonstrations show accurate suboptimality estimates and substantial sample savings, with potential applications to embedded, resource-constrained real-time controllers.
Abstract
Real-time computation of optimal control is a challenging problem and, to solve this difficulty, many frameworks proposed to use learning techniques to learn (possibly sub-optimal) controllers and enable their usage in an online fashion. Among these techniques, the optimal motion framework is a simple, yet powerful technique, that obtained success in many complex real-world applications. The main idea of this approach is to take advantage of dynamic motion primitives, a widely used tool in robotics to learn trajectories from demonstrations. While usually these demonstrations come from humans, the optimal motion framework is based on demonstrations coming from optimal solutions, such as the ones obtained by numeric solvers. As usual in many learning techniques, a drawback of this approach is that it is hard to estimate the suboptimality of learned solutions, since finding easily computable and non-trivial upper bounds to the error between an optimal solution and a learned solution is, in general, unfeasible. However, we show in this paper that it is possible to estimate this error for a broad class of problems. Furthermore, we apply this estimation technique to achieve a novel and more efficient sampling scheme to be used within the optimal motion framework, enabling the usage of this framework in some scenarios where the computational resources are limited.
