Table of Contents
Fetching ...

Policy Optimization for Unknown Systems using Differentiable Model Predictive Control

Riccardo Zuliani, Efe C. Balta, John Lygeros

TL;DR

This paper addresses policy optimization for MPC under unknown dynamics by proposing a hybrid gradient framework that blends differentiable, model-based gradients with zeroth-order updates derived from randomized smoothing. It formalizes convergence to a Goldstein delta-critical point using the notions of definability and conservative Jacobians, allowing robust optimization even when the true dynamics are imperfect. The approach is validated on a 12‑dimensional quadcopter task, where the hybrid method achieves fast transients and near‑optimal performance, outperforming purely model-based or purely model-free baselines. The work offers a practical path to safe, data-efficient MPC design in settings with model uncertainty and lays groundwork for stronger safety guarantees in future work.

Abstract

Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.

Policy Optimization for Unknown Systems using Differentiable Model Predictive Control

TL;DR

This paper addresses policy optimization for MPC under unknown dynamics by proposing a hybrid gradient framework that blends differentiable, model-based gradients with zeroth-order updates derived from randomized smoothing. It formalizes convergence to a Goldstein delta-critical point using the notions of definability and conservative Jacobians, allowing robust optimization even when the true dynamics are imperfect. The approach is validated on a 12‑dimensional quadcopter task, where the hybrid method achieves fast transients and near‑optimal performance, outperforming purely model-based or purely model-free baselines. The work offers a practical path to safe, data-efficient MPC design in settings with model uncertainty and lays groundwork for stronger safety guarantees in future work.

Abstract

Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.

Paper Structure

This paper contains 15 sections, 7 theorems, 32 equations, 2 figures.

Key Result

lemma 1

For any $\theta\in\Theta$, $\mathbb{E}_v[J_{\mathcal{C}^\delta}(\theta,v)]=\nabla \mathcal{C}^\delta(\theta)$.

Figures (2)

  • Figure 1: Tracking cost and constraint violation across iterations.
  • Figure 2: Comparison of position (left) and attitude (right) trajectories obtained with the trained controller (solid) and the controller tuned using the exact model (dash-dotted).

Theorems & Definitions (14)

  • definition 1: Definitions 1.4 and 1.5, coste1999introduction
  • definition 2: bolte2021conservative
  • lemma 1
  • proof
  • lemma 2
  • theorem 1
  • lemma 3
  • proof
  • lemma 4
  • proof
  • ...and 4 more