Table of Contents
Fetching ...

Reinforcement Learning with Model Predictive Control for Highway Ramp Metering

Filippo Airaldi, Bart De Schutter, Azita Dabiri

TL;DR

This work explores the synergy between model-based and learning-based strategies to enhance traffic flow management by use of an innovative approach to the problem of ramp metering control that embeds Reinforcement Learning techniques within the Model Predictive Control (MPC) framework.

Abstract

In the backdrop of an increasingly pressing need for effective urban and highway transportation systems, this work explores the synergy between model-based and learning-based strategies to enhance traffic flow management by use of an innovative approach to the problem of ramp metering control that embeds Reinforcement Learning (RL) techniques within the Model Predictive Control (MPC) framework. The control problem is formulated as an RL task by crafting a suitable stage cost function that is representative of the traffic conditions, variability in the control action, and violations of the constraint on the maximum number of vehicles in queue. An MPC-based RL approach, which leverages the MPC optimal problem as a function approximation for the RL algorithm, is proposed to learn to efficiently control an on-ramp and satisfy its constraints despite uncertainties in the system model and variable demands. Simulations are performed on a benchmark small-scale highway network to compare the proposed methodology against other state-of-the-art control approaches. Results show that, starting from an MPC controller that has an imprecise model and is poorly tuned, the proposed methodology is able to effectively learn to improve the control policy such that congestion in the network is reduced and constraints are satisfied, yielding an improved performance that is superior to the other controllers.

Reinforcement Learning with Model Predictive Control for Highway Ramp Metering

TL;DR

This work explores the synergy between model-based and learning-based strategies to enhance traffic flow management by use of an innovative approach to the problem of ramp metering control that embeds Reinforcement Learning techniques within the Model Predictive Control (MPC) framework.

Abstract

In the backdrop of an increasingly pressing need for effective urban and highway transportation systems, this work explores the synergy between model-based and learning-based strategies to enhance traffic flow management by use of an innovative approach to the problem of ramp metering control that embeds Reinforcement Learning (RL) techniques within the Model Predictive Control (MPC) framework. The control problem is formulated as an RL task by crafting a suitable stage cost function that is representative of the traffic conditions, variability in the control action, and violations of the constraint on the maximum number of vehicles in queue. An MPC-based RL approach, which leverages the MPC optimal problem as a function approximation for the RL algorithm, is proposed to learn to efficiently control an on-ramp and satisfy its constraints despite uncertainties in the system model and variable demands. Simulations are performed on a benchmark small-scale highway network to compare the proposed methodology against other state-of-the-art control approaches. Results show that, starting from an MPC controller that has an imprecise model and is poorly tuned, the proposed methodology is able to effectively learn to improve the control policy such that congestion in the network is reduced and constraints are satisfied, yielding an improved performance that is superior to the other controllers.
Paper Structure (23 sections, 24 equations, 10 figures, 3 tables)

This paper contains 23 sections, 24 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Diagram of the MPC-based RL architecture
  • Figure 2: Structure of the three-segment highway network
  • Figure 3: External inputs affecting the network's dynamics, in the form of the demands at the origins (upper) and the congestion scenario at the destination (lower). The shaded areas represent the 2-standard deviation ranges from which the random demands are sampled, whereas the solid lines represent one random sample for each.
  • Figure 4: Evolution of the three contributions to the RL cost \ref{['methodology:eq:rl-stage-cost']} during the learning process, namely, from top to bottom, the Total-Time-Spent (TTS), variability of the control action, and violation of the maximum queue constraint on $\text{O}_2$
  • Figure 5: On average, the policy $\pi_{\bm\theta}$ gets better at avoiding constraint violations as it learns (the red line represents the threshold $w_\text{max}$ imposed on the queue on the on-ramp $\text{O}_2$)
  • ...and 5 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2