Table of Contents
Fetching ...

Data-driven Acceleration of MPC with Guarantees

Agustin Castellano, Shijie Pan, Enrique Mallada

TL;DR

The paper introduces a data-driven framework to accelerate Model Predictive Control by replacing online optimization with a nonparametric policy learned from offline MPC solutions, grounded on a conservative offline problem and an erosion-based feasibility region.Key contributions include a dataset-driven policy that selects actions via a nearest-neighbor-like rule with a cost-to-go regularization, plus rigorous conditions guaranteeing recursive feasibility and bounded suboptimality that tighten with more offline data.The approach delivers large speedups (100–1000x) in online control with modest suboptimality and is demonstrated on benchmark tasks using GPU-accelerated inference, highlighting its potential for real-time control applications.Overall, the work provides a principled data-driven trade-off between data coverage and performance, supported by two algorithms for data collection and domain verification that yield certified policies.

Abstract

Model Predictive Control (MPC) is a powerful framework for optimal control but can be too slow for low-latency applications. We present a data-driven framework to accelerate MPC by replacing online optimization with a nonparametric policy constructed from offline MPC solutions. Our policy is greedy with respect to a constructed upper bound on the optimal cost-to-go, and can be implemented as a nonparametric lookup rule that is orders of magnitude faster than solving MPC online. Our analysis shows that under sufficient coverage condition of the offline data, the policy is recursively feasible and admits provable, bounded optimality gap. These conditions establish an explicit trade-off between the amount of data collected and the tightness of the bounds. Our experiments show that this policy is between 100 and 1000 times faster than standard MPC, with only a modest hit to optimality, showing potential for real-time control tasks.

Data-driven Acceleration of MPC with Guarantees

TL;DR

The paper introduces a data-driven framework to accelerate Model Predictive Control by replacing online optimization with a nonparametric policy learned from offline MPC solutions, grounded on a conservative offline problem and an erosion-based feasibility region.Key contributions include a dataset-driven policy that selects actions via a nearest-neighbor-like rule with a cost-to-go regularization, plus rigorous conditions guaranteeing recursive feasibility and bounded suboptimality that tighten with more offline data.The approach delivers large speedups (100–1000x) in online control with modest suboptimality and is demonstrated on benchmark tasks using GPU-accelerated inference, highlighting its potential for real-time control applications.Overall, the work provides a principled data-driven trade-off between data coverage and performance, supported by two algorithms for data collection and domain verification that yield certified policies.

Abstract

Model Predictive Control (MPC) is a powerful framework for optimal control but can be too slow for low-latency applications. We present a data-driven framework to accelerate MPC by replacing online optimization with a nonparametric policy constructed from offline MPC solutions. Our policy is greedy with respect to a constructed upper bound on the optimal cost-to-go, and can be implemented as a nonparametric lookup rule that is orders of magnitude faster than solving MPC online. Our analysis shows that under sufficient coverage condition of the offline data, the policy is recursively feasible and admits provable, bounded optimality gap. These conditions establish an explicit trade-off between the amount of data collected and the tightness of the bounds. Our experiments show that this policy is between 100 and 1000 times faster than standard MPC, with only a modest hit to optimality, showing potential for real-time control tasks.

Paper Structure

This paper contains 20 sections, 9 theorems, 46 equations, 4 figures, 2 algorithms.

Key Result

Proposition 1

Suppose Assumption assn:lipschitz-dynamics holds, the stage cost $c(\cdot,\cdot)$ is $L_c$-Lipschitz and $\gamma \max\{L_f,L_u\} < 1$. Then Assumption assn:J-locally-lipschitz.assn:J-lip-2 holds with $L_J \leq \frac{L_c}{1-\gamma \max\{L_f,L_u\}}\;.$

Figures (4)

  • Figure 1: Left: Original constraint set $\mathbb{X}$ and its erosion $\mathbb{X}_{-\varepsilon}$. Middle: Feasibility certificates under our framework. The trajectory $\left(\mathbf{x}_0, \mathbf{x}_1, \mathbf{x}_2, \ldots\right)$ marked with '$\star$' is produced by our policy. Optimal transitions $\mathbf{y}\overset{\pi^\star}{\to}\mathbf{y}'$ and $\mathbf{z}\overset{\pi^\star}{\to}\mathbf{z}'$ are precomputed offline (by solving \ref{['eq:conservative-prob']}) and stored in a dataset $\mathcal{D}$. The control associated with each state (e.g. $\mathbf{y}$) in the dataset is also feasible in a neighborhood of that point (the ball with radius $r(\mathbf{y})$, see Prop. \ref{['prop:local-feasibility']}). Right: Performance guarantees for our policy (Theorem \ref{['thm:performance-guarantees']}). Each triplet $(\mathbf{x}_i,\mathbf{u}_i,\mathbf{J}_i)$ in the dataset certifies a ball $\mathbb{B}\left(\mathbf{x}_i, \tfrac{\beta\left(\mathbf{J}_i+\eta\right)}{\lambda\left(2+\beta\right)}\right)$, wherein $\tfrac{J^\pi(\mathbf{x})-J(\mathbf{x},\varepsilon)}{J(\mathbf{x},\varepsilon)+\eta} \leq \beta$ for any $\mathbf{x}$ in that ball.
  • Figure 2: Algorithm 2 and the visualization of the cell verification/splitting method. Each cell $\mathbb{X}_1,\dots,\mathbb{X}_9$ is tested against two criteria—(i) one-step feasibility and (ii) performance. Cells that pass are shown in green and kept in Verified; those that fail are shown in red and are split into $3^n$ child cells that are yet to be verified. The children are then re-verified using the same criteria and are either accepted (green) and moved to Verified, or split again (red), proceeding sequentially.
  • Figure 3: Statistics for the inverted pendulum (top) and minimum time problem (bottom) over 100 trajectories. Left: Per-step latency (in $ms$) for each controller. Our controllers (NN_1_XXXX) are ordered left to right from smallest to largest dataset $\mathcal{D}$. Middle: Distribution of the relative optimality gap. Boxes correspond to the interquartile range $(25\% - 75\%)$, black line shows the median and the green arrow corresponds to the mean. Right: Trade-off between computation time and relative gap for our method ( red $\star$'s) and MPC ( blue $\square$'s). Our method is substantially faster than MPC and, with sufficient data, outperforms MPC with shorter lookahead horizons.
  • Figure 4: Algorithm \ref{['alg:verification']} in action: feasibility (top row) and optimality (bottom row) certificates for the LQR problem. Iterations are ordered from left to right. Top: for each figure, we show in red the cells that don't satisfy the feasibility condition (Prop. \ref{['prop:one-step-feasibility']}). The algorithm recursively splits each cell and runs trajectories from the center points. Verified cells are shown in green. Bottom: After all cells have been deemed feasible, the algorithm verifies the optimality gap (Theorem \ref{['thm:performance-guarantees']}). Guaranteed suboptimal cells are shown in blue. At termination (bottom right panel) the algorithm has verified the whole state space $\mathbb{X}$.

Theorems & Definitions (21)

  • Proposition 1: Sufficient conditions for Assumption \ref{['assn:J-locally-lipschitz']}.\ref{['assn:J-lip-2']} Lemma 3 in bucsoniu2018continuous
  • Definition 1: Nonparametric policy
  • Remark 1: Policy is built from the conservative problem
  • Definition 2: One-step feasibility
  • Proposition 2: Local feasibility
  • Proposition 3: Recursive feasibility for our policy
  • Definition 3: Nonparametric upper & lower bounds on J
  • Theorem 1: Policy evaluation inequality
  • proof
  • Theorem 2: Performance guarantees
  • ...and 11 more