Table of Contents
Fetching ...

A Fairness-Oriented Multi-Objective Reinforcement Learning approach for Autonomous Intersection Management

Matteo Cederle, Marco Fabris, Gian Antonio Susto

Abstract

This study introduces a novel multi-objective reinforcement learning (MORL) approach for autonomous intersection management, aiming to balance traffic efficiency and environmental sustainability across electric and internal combustion vehicles. The proposed method utilizes MORL to identify Pareto-optimal policies, with a post-hoc fairness criterion guiding the selection of the final policy. Simulation results in a complex intersection scenario demonstrate the approach's effectiveness in optimizing traffic efficiency and emissions reduction while ensuring fairness across vehicle categories. We believe that this criterion can lay the foundation for ensuring equitable service, while fostering safe, efficient, and sustainable practices in smart urban mobility.

A Fairness-Oriented Multi-Objective Reinforcement Learning approach for Autonomous Intersection Management

Abstract

This study introduces a novel multi-objective reinforcement learning (MORL) approach for autonomous intersection management, aiming to balance traffic efficiency and environmental sustainability across electric and internal combustion vehicles. The proposed method utilizes MORL to identify Pareto-optimal policies, with a post-hoc fairness criterion guiding the selection of the final policy. Simulation results in a complex intersection scenario demonstrate the approach's effectiveness in optimizing traffic efficiency and emissions reduction while ensuring fairness across vehicle categories. We believe that this criterion can lay the foundation for ensuring equitable service, while fostering safe, efficient, and sustainable practices in smart urban mobility.

Paper Structure

This paper contains 14 sections, 5 figures.

Figures (5)

  • Figure 1: Training metrics of our approach. Figure \ref{['fig:hv']} shows the evolution of the hypervolume metric as a function of the evaluation runs performed during training. Similarly, Figure \ref{['fig:crashes']} displays the evolution of the number of crashes as training progresses. The shaded regions represent the $95\%$ confidence interval of the average evaluation over 10 random seeds.
  • Figure 2: Architecture for the actor network. We refer to klimke2022enhanced and schlichtkrull2018modeling for the description of the graph layers $RGCN^+$ and $RGCN$, respectively. The critic network is analogous apart from the $dec$ layer which outputs a single value instead of the joint action for all the vehicles.
  • Figure 3: Pareto front for the considered MORL problem. The average efficiency maximization and emissions minimization objectives are represented on the x and y axes, respectively. Each point on the front corresponds to a different value of $\omega$.
  • Figure 4: Velocities and emissions distributions per time step during evaluation. Each point on the x axis corresponds to a Pareto efficient solution, going from left to right when looking at Figure \ref{['fig:pareto']}.
  • Figure 5: Average difference in travel time between electric and petrol vehicles. Each point on the x axis corresponds to a Pareto efficient solution, going from left to right when looking at Figure \ref{['fig:pareto']}. The dotted line and the shaded areas represent the linear interpolation of the points and their 95% confidence interval, respectively.