Explainable Reinforcement Learning for Formula One Race Strategy

Devin Thomas; Junqi Jiang; Avinash Kori; Aaron Russo; Steffen Winkler; Stuart Sale; Joseph McMillan; Francesco Belardinelli; Antonio Rago

Explainable Reinforcement Learning for Formula One Race Strategy

Devin Thomas, Junqi Jiang, Avinash Kori, Aaron Russo, Steffen Winkler, Stuart Sale, Joseph McMillan, Francesco Belardinelli, Antonio Rago

TL;DR

This work introduces Race Strategy Reinforcement Learning (RSRL), a DRQN-based framework for optimizing Formula One race strategy under partial observability, evaluated on the 2023 Bahrain Grand Prix. The authors design a portable system architecture with a UnifiedRaceState/UnifiedRaceStrategy abstraction and a four-action pitstop space, enabling training on Monte Carlo race simulations and deployment with live data. RSRL outperforms fixed and Mercedes SOTA baselines, achieving an average finishing position of $P5.33$ compared with $P5.63$ and $P5.86$, and demonstrates generalisation across tracks through training on multiple tracks. To build trust, the approach integrates explainable AI techniques—TimeSHAP, VIPER, and counterfactual decision trees—providing feature-level attributions, faithful surrogate models, and actionable counterfactuals. The work also shows how increasing training diversity can improve performance on unseen tracks, facilitating practical deployment in real-world racing contexts.

Abstract

In Formula One, teams compete to develop their cars and achieve the highest possible finishing position in each race. During a race, however, teams are unable to alter the car, so they must improve their cars' finishing positions via race strategy, i.e. optimising their selection of which tyre compounds to put on the car and when to do so. In this work, we introduce a reinforcement learning model, RSRL (Race Strategy Reinforcement Learning), to control race strategies in simulations, offering a faster alternative to the industry standard of hard-coded and Monte Carlo-based race strategies. Controlling cars with a pace equating to an expected finishing position of P5.5 (where P1 represents first place and P20 is last place), RSRL achieves an average finishing position of P5.33 on our test race, the 2023 Bahrain Grand Prix, outperforming the best baseline of P5.63. We then demonstrate, in a generalisability study, how performance for one track or multiple tracks can be prioritised via training. Further, we supplement model predictions with feature importance, decision tree-based surrogate models, and decision tree counterfactuals towards improving user trust in the model. Finally, we provide illustrations which exemplify our approach in real-world situations, drawing parallels between simulations and reality.

Explainable Reinforcement Learning for Formula One Race Strategy

TL;DR

compared with

and

, and demonstrates generalisation across tracks through training on multiple tracks. To build trust, the approach integrates explainable AI techniques—TimeSHAP, VIPER, and counterfactual decision trees—providing feature-level attributions, faithful surrogate models, and actionable counterfactuals. The work also shows how increasing training diversity can improve performance on unseen tracks, facilitating practical deployment in real-world racing contexts.

Abstract

Paper Structure (10 sections, 1 equation, 6 figures, 5 tables)

This paper contains 10 sections, 1 equation, 6 figures, 5 tables.

Introduction
Background and Related Work
Race Strategy Reinforcement Learning
Problem Formalisation
System Architecture
Evaluation
Model Performance
Generalisability
Explanations
Conclusions and Future Work

Figures (6)

Figure 1: The system architecture implemented utilises abstraction to provide future flexibility, extensibility and portability. It allows for the substitution of different data sources for model prediction and the modification of race states and strategies .
Figure 2: Average finishing positions for Mercedes' SOTA model, RSRL, and the Fixed Strategy model.
Figure 3: The most common tyre strategies generated by each generalisability model, Mercedes' SOTA model, and two of the fixed strategies from the Fixed Strategy model for the Abu Dhabi, Japan, Mexico and Saudi Arabia Grands Prix. The black 'Pit Window' represents an average period in which a pitstop is executed.
Figure 4: The feature importance for RSRL on lap 10 of the 2023 Bahrain Grand Prix.
Figure 5: Accuracy during training and testing of 50 iterations of VIPER decision trees. Each iteration covers all previous data points queried from the oracle (the best RSRL model), up to 26,714 in iteration 50.
...and 1 more figures

Explainable Reinforcement Learning for Formula One Race Strategy

TL;DR

Abstract

Explainable Reinforcement Learning for Formula One Race Strategy

Authors

TL;DR

Abstract

Table of Contents

Figures (6)