Table of Contents
Fetching ...

Altruistic Maneuver Planning for Cooperative Autonomous Vehicles Using Multi-agent Advantage Actor-Critic

Behrad Toghi, Rodolfo Valiente, Dorsa Sadigh, Ramtin Pedarsani, Yaser P. Fallah

TL;DR

The paper addresses safer, more efficient mixed-autonomy traffic by enabling autonomous vehicles to act altruistically toward human drivers and other AVs using a decentralized multi-agent A2C framework guided by social value orientation. It introduces a VelocityMap-based state representation, discrete meta-actions, and a decentralized reward that blends egoistic and social utilities controlled by φ_i. Training in a semi-sequential fashion without replay, the agents learn sequences of maneuvers that improve merging safety and overall traffic flow, outperforming egoistic baselines. While demonstrating promising results, the work relies on synthetic driving models and suggests future work with richer human-driver data and recurrent architectures to further enhance realism and adaptability.

Abstract

With the adoption of autonomous vehicles on our roads, we will witness a mixed-autonomy environment where autonomous and human-driven vehicles must learn to co-exist by sharing the same road infrastructure. To attain socially-desirable behaviors, autonomous vehicles must be instructed to consider the utility of other vehicles around them in their decision-making process. Particularly, we study the maneuver planning problem for autonomous vehicles and investigate how a decentralized reward structure can induce altruism in their behavior and incentivize them to account for the interest of other autonomous and human-driven vehicles. This is a challenging problem due to the ambiguity of a human driver's willingness to cooperate with an autonomous vehicle. Thus, in contrast with the existing works which rely on behavior models of human drivers, we take an end-to-end approach and let the autonomous agents to implicitly learn the decision-making process of human drivers only from experience. We introduce a multi-agent variant of the synchronous Advantage Actor-Critic (A2C) algorithm and train agents that coordinate with each other and can affect the behavior of human drivers to improve traffic flow and safety.

Altruistic Maneuver Planning for Cooperative Autonomous Vehicles Using Multi-agent Advantage Actor-Critic

TL;DR

The paper addresses safer, more efficient mixed-autonomy traffic by enabling autonomous vehicles to act altruistically toward human drivers and other AVs using a decentralized multi-agent A2C framework guided by social value orientation. It introduces a VelocityMap-based state representation, discrete meta-actions, and a decentralized reward that blends egoistic and social utilities controlled by φ_i. Training in a semi-sequential fashion without replay, the agents learn sequences of maneuvers that improve merging safety and overall traffic flow, outperforming egoistic baselines. While demonstrating promising results, the work relies on synthetic driving models and suggests future work with richer human-driver data and recurrent architectures to further enhance realism and adaptability.

Abstract

With the adoption of autonomous vehicles on our roads, we will witness a mixed-autonomy environment where autonomous and human-driven vehicles must learn to co-exist by sharing the same road infrastructure. To attain socially-desirable behaviors, autonomous vehicles must be instructed to consider the utility of other vehicles around them in their decision-making process. Particularly, we study the maneuver planning problem for autonomous vehicles and investigate how a decentralized reward structure can induce altruism in their behavior and incentivize them to account for the interest of other autonomous and human-driven vehicles. This is a challenging problem due to the ambiguity of a human driver's willingness to cooperate with an autonomous vehicle. Thus, in contrast with the existing works which rely on behavior models of human drivers, we take an end-to-end approach and let the autonomous agents to implicitly learn the decision-making process of human drivers only from experience. We introduce a multi-agent variant of the synchronous Advantage Actor-Critic (A2C) algorithm and train agents that coordinate with each other and can affect the behavior of human drivers to improve traffic flow and safety.

Paper Structure

This paper contains 10 sections, 4 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Altruistic AVs can compromise on their individual utility to result socially-desirable behaviors that account for other vehicles. (a) Egoistic AVs solely optimize for their own utility and do not allow the merging vehicle to merge, (b) Altruistic AVs compromise on their individual utility in order to account for the human-driven vehicles and open up space for the merging vehicle.
  • Figure 2: The Social Value Orientation ring demonstrates different behaviors based on a human/robot's preference to account for others. The diameter of the circles show the likelihood of a specific behavior. Figure is based on the data from garapin2015does
  • Figure 3: Stacked multi-channel VelocityMap state representation embeds the speed and position of vehicles. Each observation $o_i$ is a tensor of size $10 \times (4 \times 64 \times 512$).
  • Figure 4: The multi-agent Advantage Actor-Critic framework and policy dissemination process.
  • Figure 5: Distribution of distance traveled by the merging vehicle when the AVs act egoistically (Gray) as compared to the case with altruistic AVs (Orange).
  • ...and 3 more figures