Table of Contents
Fetching ...

Learning-based Multi-agent Race Strategies in Formula 1

Giona Fieni, Joschua Wüthrich, Marc-Philippe Neumann, Christopher H. Onder

TL;DR

A reinforcement learning approach for multi-agent race strategy optimization that relies only on information available during real races, it can support race strategists'decisions before and during races.

Abstract

In Formula 1, race strategies are adapted according to evolving race conditions and competitors' actions. This paper proposes a reinforcement learning approach for multi-agent race strategy optimization. Agents learn to balance energy management, tire degradation, aerodynamic interaction, and pit-stop decisions. Building on a pre-trained single-agent policy, we introduce an interaction module that accounts for the behavior of competitors. The combination of the interaction module and a self-play training scheme generates competitive policies, and agents are ranked based on their relative performance. Results show that the agents adapt pit timing, tire selection, and energy allocation in response to opponents, achieving robust and consistent race performance. Because the framework relies only on information available during real races, it can support race strategists' decisions before and during races.

Learning-based Multi-agent Race Strategies in Formula 1

TL;DR

A reinforcement learning approach for multi-agent race strategy optimization that relies only on information available during real races, it can support race strategists'decisions before and during races.

Abstract

In Formula 1, race strategies are adapted according to evolving race conditions and competitors' actions. This paper proposes a reinforcement learning approach for multi-agent race strategy optimization. Agents learn to balance energy management, tire degradation, aerodynamic interaction, and pit-stop decisions. Building on a pre-trained single-agent policy, we introduce an interaction module that accounts for the behavior of competitors. The combination of the interaction module and a self-play training scheme generates competitive policies, and agents are ranked based on their relative performance. Results show that the agents adapt pit timing, tire selection, and energy allocation in response to opponents, achieving robust and consistent race performance. Because the framework relies only on information available during real races, it can support race strategists' decisions before and during races.
Paper Structure (17 sections, 18 equations, 6 figures, 2 tables)

This paper contains 17 sections, 18 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Schematic of the agent-environment interaction. Agent $1$ is the agent to be trained, while agent $i$ is fixed and it is part of the environment. With their actions, they directly affect the ego car. Aerodynamic interaction couples the models of the two cars. The observations are divided whether they come from the ego car or from the competitor's one.
  • Figure 2: Schematic of the Race car model. Inputs are the agent's action $\mathbf{a}$, the gap time to the opponent $t_\mathrm{gap}$ and the additional lap time caused by the aerodynamic interaction $\Delta T_\mathrm{int}$. Output are the observation of the ego car $\mathbf{o}$ and the available observations for the opponent $\mathbf{\tilde{o}}$. For a detailed mathematical description, the reader is referred to fieni2025towards.
  • Figure 3: Schematic of the agent's structure. The single-agent policy is taken from fieni2025towards and its weights are kept frozen during training, while only the interaction module is trained. Inputs are the observation of the ego car $\mathbf{o}$ and the ones about the opponent $\mathbf{\tilde{o}}$. The nominal policy $\mathbf{a}_\mathrm{nom}$ is combined with the policy of the interaction module $\mathbf{\Delta a}$ to output the action $\mathbf{a}$.
  • Figure 4: Custom self-play training scheme. The training agent is shown on the left, while the opponent embedded in the environment remains fixed. During the first iteration, the single-agent policy is the only opponent. Then, the training agent with the highest Elo score ("best"), is selected for the pool of future opponents of iteration $n$.
  • Figure 5: Race strategies and race time difference for the duel between $A$ (in blue) and $B$ (in red). The first plot shows the normalized fuel energy allocation, the second one the normalized battery energy allocation, and the third one the pit stop decision variable. Agent $A$ starts $\qty{0.5}{\second}$ behind $B$, and a negative gap time means that agent $A$ is ahead.
  • ...and 1 more figures