Learning-based social coordination to improve safety and robustness of cooperative autonomous vehicles in mixed traffic
Rodolfo Valiente, Behrad Toghi, Mahdi Razzaghpour, Ramtin Pedarsani, Yaser P. Fallah
TL;DR
The paper tackles safety and robustness of cooperative autonomous vehicles in mixed traffic by framing the problem as decentralized multi-agent reinforcement learning with altruistic rewards. It introduces a social utility-based framework that distinguishes sympathy toward HVs from cooperation among AVs, implemented via a 3D-CNN architecture and a safety prioritizer to ensure safety during learning and deployment. Key contributions include a POSG formulation, a decentralized social reward structure, domain adaptation and transfer learning analyses, and empirical demonstrations that altruistic AVs can learn to influence HV behavior to improve overall traffic safety and efficiency. The findings suggest that social coordination among AVs, under diverse HV behaviors and scenarios, yields more robust and societally beneficial outcomes than egoistic driving, informing future development of socially aware autonomous systems.
Abstract
It is expected that autonomous vehicles(AVs) and heterogeneous human-driven vehicles(HVs) will coexist on the same road. The safety and reliability of AVs will depend on their social awareness and their ability to engage in complex social interactions in a socially accepted manner. However, AVs are still inefficient in terms of cooperating with HVs and struggle to understand and adapt to human behavior, which is particularly challenging in mixed autonomy. In a road shared by AVs and HVs, the social preferences or individual traits of HVs are unknown to the AVs and different from AVs, which are expected to follow a policy, HVs are particularly difficult to forecast since they do not necessarily follow a stationary policy. To address these challenges, we frame the mixed-autonomy problem as a multi-agent reinforcement learning (MARL) problem and propose an approach that allows AVs to learn the decision-making of HVs implicitly from experience, account for all vehicles' interests, and safely adapt to other traffic situations. In contrast with existing works, we quantify AVs' social preferences and propose a distributed reward structure that introduces altruism into their decision-making process, allowing the altruistic AVs to learn to establish coalitions and influence the behavior of HVs.
