Policy Optimization for Unknown Systems using Differentiable Model Predictive Control
Riccardo Zuliani, Efe C. Balta, John Lygeros
TL;DR
This paper addresses policy optimization for MPC under unknown dynamics by proposing a hybrid gradient framework that blends differentiable, model-based gradients with zeroth-order updates derived from randomized smoothing. It formalizes convergence to a Goldstein delta-critical point using the notions of definability and conservative Jacobians, allowing robust optimization even when the true dynamics are imperfect. The approach is validated on a 12‑dimensional quadcopter task, where the hybrid method achieves fast transients and near‑optimal performance, outperforming purely model-based or purely model-free baselines. The work offers a practical path to safe, data-efficient MPC design in settings with model uncertainty and lays groundwork for stronger safety guarantees in future work.
Abstract
Model-based policy optimization often struggles with inaccurate system dynamics models, leading to suboptimal closed-loop performance. This challenge is especially evident in Model Predictive Control (MPC) policies, which rely on the model for real-time trajectory planning and optimization. We introduce a novel policy optimization framework for MPC-based policies combining differentiable optimization with zeroth-order optimization. Our method combines model-based and model-free gradient estimation approaches, achieving faster transient performance compared to fully data-driven approaches while maintaining convergence guarantees, even under model uncertainty. We demonstrate the effectiveness of the proposed approach on a nonlinear control task involving a 12-dimensional quadcopter model.
