Table of Contents
Fetching ...

Bridging RL and MPC for mixed-integer optimal control with application to Formula 1 race strategies

Joschua Wüthrich, Romir Damle, Giona Fieni, Melanie N. Zeilinger, Christopher H. Onder, Andrea Carron

Abstract

We propose a hybrid reinforcement learning (RL) and model predictive control (MPC) framework for mixed-integer optimal control, where discrete variables enter the cost and dynamics but not the constraints. Existing hierarchical approaches use RL only for the discrete action space, leaving continuous optimization to MPC. Unlike these methods, we train the RL agent on the full hybrid action space, ensuring consistency with the cost of the underlying Markov decision process. During deployment, the RL actor is rolled out over the prediction horizon to parametrize an integer-free nonlinear MPC through the discrete action sequence and provide a continuous warm-start. The learned critic serves as a terminal cost to capture long-term performance. We prove recursive feasibility, and validate the framework on a Formula 1 race strategy problem. The hybrid method achieves near-optimal performance relative to an offline mixed-integer nonlinear program benchmark, outperforming a standalone RL agent. Moreover, the hybrid scheme enables adaptation to unseen disturbances through modular MPC extensions at zero retraining cost.

Bridging RL and MPC for mixed-integer optimal control with application to Formula 1 race strategies

Abstract

We propose a hybrid reinforcement learning (RL) and model predictive control (MPC) framework for mixed-integer optimal control, where discrete variables enter the cost and dynamics but not the constraints. Existing hierarchical approaches use RL only for the discrete action space, leaving continuous optimization to MPC. Unlike these methods, we train the RL agent on the full hybrid action space, ensuring consistency with the cost of the underlying Markov decision process. During deployment, the RL actor is rolled out over the prediction horizon to parametrize an integer-free nonlinear MPC through the discrete action sequence and provide a continuous warm-start. The learned critic serves as a terminal cost to capture long-term performance. We prove recursive feasibility, and validate the framework on a Formula 1 race strategy problem. The hybrid method achieves near-optimal performance relative to an offline mixed-integer nonlinear program benchmark, outperforming a standalone RL agent. Moreover, the hybrid scheme enables adaptation to unseen disturbances through modular MPC extensions at zero retraining cost.

Paper Structure

This paper contains 17 sections, 1 theorem, 16 equations, 5 figures, 1 table.

Key Result

Proposition 1

Let Assumptions ass:one and ass:two hold. If the reduced MPC problem eq:reduced_MPC is feasible at time $j=0$, then it remains feasible for all $j \in \mathbb{N}$, and consequently the hybrid RL-MPC scheme is recursively feasible. $\blacktriangleleft$$\blacktriangleleft$

Figures (5)

  • Figure 3: Hybrid RL-MPC framework. The RL actor is rolled out over the prediction horizon $N$ to supply discrete action trajectories and a continuous warm-start to the MPC, which optimizes continuous inputs with $C_\theta(x,u^\mathrm{c},u^\mathrm{d})$ as terminal cost. Only the first continuous input ${u}_{0}^\mathrm{c,*}$ from the MPC and the first discrete input ${u}_{0}^\mathrm{d,RL}$ from the RL agent are applied to the system.
  • Figure 4:
  • Figure 5:
  • Figure 6:
  • Figure 7:

Theorems & Definitions (5)

  • Remark 1
  • Remark 2
  • Proposition 1
  • proof
  • Remark 3