Real-time Sampling-based Model Predictive Control based on Reverse Kullback-Leibler Divergence and Its Adaptive Acceleration

Taisuke Kobayashi; Kota Fukumoto

Real-time Sampling-based Model Predictive Control based on Reverse Kullback-Leibler Divergence and Its Adaptive Acceleration

Taisuke Kobayashi, Kota Fukumoto

TL;DR

The paper tackles the real-time constraints of sampling-based MPC by reframing the optimization around reverse KL divergence, which promotes rapid convergence to local optima. It develops Reverse MPC solved via mirror descent, enabling both positive and negative candidate weights, and introduces Reject MPC to mitigate interference between updates through a decomposed policy and pseudo-rejection sampling. To accelerate convergence, it adapts AGD+ to a dynamic mirror space, yielding Accel MPC with an adaptive step size that accounts for gradient noise. Empirical results in Brax and a force-driven mobile robot demonstrate improved task coverage, smoother real-time control on CPU, and practical applicability, especially for higher-DOF problems and real-time operation at 50 Hz.

Abstract

Sampling-based model predictive control (MPC) has the potential for use in a wide variety of robotic systems. However, its unstable updates and poor convergence render it unsuitable for real-time control of robotic systems. This study addresses this challenge with a novel approach from reverse Kullback-Leibler divergence, which has a mode-seeking property and is likely to find one of the locally optimal solutions early. Using this approach, a weighted maximum likelihood estimation with positive and negative weights is obtained and solved using the mirror descent (MD) algorithm. Negative weights eliminate unnecessary actions, but a practical implementation needs to be designed to avoid interference with positive and negative updates based on rejection sampling. In addition, Nesterov's acceleration method for the proposed MD is modified to improve heuristic step size adaptive to the noise estimated in update amounts. Real-time simulations show that the proposed method can solve a wider variety of tasks statistically than the conventional method. In addition, higher degrees-of-freedom tasks can be solved by the improved acceleration even with a CPU only. The real-world applicability of the proposed method is also demonstrated by optimizing the operability in a variable impedance control of a force-driven mobile robot. https://youtu.be/D8bFMzct1XM

Real-time Sampling-based Model Predictive Control based on Reverse Kullback-Leibler Divergence and Its Adaptive Acceleration

TL;DR

Abstract

Paper Structure (30 sections, 31 equations, 16 figures, 2 tables)

This paper contains 30 sections, 31 equations, 16 figures, 2 tables.

Introduction
Related work
Types of nonlinear MPC
Gradient-based approach
Sampling-based approach
Improvement of real-time performance
Efficient exploration
Extraction of lower dimension
Use of gradient-based method
Improvements at implementation level
Problem statement
Sampling-based model predictive control
Open issues
Basic derivation
Minimization of reverse Kullback-Leibler divergence
...and 15 more sections

Figures (16)

Figure 1: Overview of sampling-based MPC. Candidates of action sequence are sampled from $\pi$ and are evaluated using the given model. According to the sum of costs of each candidate, $\pi$ is updated to selectively sample the candidates with smaller costs. By repeating this process, $\pi$ converges to the optimal solution.
Figure 2: Convergence properties of FKL and RKL divergences. When minimizing FKL, $\pi$ obtains the mean of $\pi^\ast$, so-called mass-covering property. When minimizing RKL, $\pi$ obtains one of the modes of $\pi^\ast$, so-called mode-seeking property.
Figure 3: Overview of MD algorithm. The parameter in the primal space with constraints, $\theta$, is mapped into its mirror space without any constraints, $z$. The gradient w.r.t. $\theta$ is adopted to update $z$. The updated $z$ is again mapped into the primal space as the updated $\theta$.
Figure 4: Interference with positive/negative updates. If the repulsion from the drop samples is in the same direction as the atraction to the elite samples, the policy can converge to the optimal one. If not, the repulsion conflicts with the atraction, preventing the desired updates of the policy.
Figure 5: Proposal of Reject MPC. Two policies $\pi^{+,-}$ are updated to represent the elite and drop samples, respectively. They are composed as $\pi$ in a pseudo-rejection sampling manner.
...and 11 more figures

Real-time Sampling-based Model Predictive Control based on Reverse Kullback-Leibler Divergence and Its Adaptive Acceleration

TL;DR

Abstract

Real-time Sampling-based Model Predictive Control based on Reverse Kullback-Leibler Divergence and Its Adaptive Acceleration

Authors

TL;DR

Abstract

Table of Contents

Figures (16)