Table of Contents
Fetching ...

OPTIMA: Optimized Policy for Intelligent Multi-Agent Systems Enables Coordination-Aware Autonomous Vehicles

Rui Du, Kai Zhao, Jinlong Hou, Qiang Zhang, Peter Zhang

TL;DR

This work introduces OPTIMA, a novel distributed reinforcement learning framework for cooperative autonomous vehicle tasks that alternates between thorough data sampling from environmental interactions and multi-agent reinforcement learning algorithms to optimize CAV cooperation, emphasizing both safety and efficiency.

Abstract

Coordination among connected and autonomous vehicles (CAVs) is advancing due to developments in control and communication technologies. However, much of the current work is based on oversimplified and unrealistic task-specific assumptions, which may introduce vulnerabilities. This is critical because CAVs not only interact with their environment but are also integral parts of it. Insufficient exploration can result in policies that carry latent risks, highlighting the need for methods that explore the environment both extensively and efficiently. This work introduces OPTIMA, a novel distributed reinforcement learning framework for cooperative autonomous vehicle tasks. OPTIMA alternates between thorough data sampling from environmental interactions and multi-agent reinforcement learning algorithms to optimize CAV cooperation, emphasizing both safety and efficiency. Our goal is to improve the generality and performance of CAVs in highly complex and crowded scenarios. Furthermore, the industrial-scale distributed training system easily adapts to different algorithms, reward functions, and strategies.

OPTIMA: Optimized Policy for Intelligent Multi-Agent Systems Enables Coordination-Aware Autonomous Vehicles

TL;DR

This work introduces OPTIMA, a novel distributed reinforcement learning framework for cooperative autonomous vehicle tasks that alternates between thorough data sampling from environmental interactions and multi-agent reinforcement learning algorithms to optimize CAV cooperation, emphasizing both safety and efficiency.

Abstract

Coordination among connected and autonomous vehicles (CAVs) is advancing due to developments in control and communication technologies. However, much of the current work is based on oversimplified and unrealistic task-specific assumptions, which may introduce vulnerabilities. This is critical because CAVs not only interact with their environment but are also integral parts of it. Insufficient exploration can result in policies that carry latent risks, highlighting the need for methods that explore the environment both extensively and efficiently. This work introduces OPTIMA, a novel distributed reinforcement learning framework for cooperative autonomous vehicle tasks. OPTIMA alternates between thorough data sampling from environmental interactions and multi-agent reinforcement learning algorithms to optimize CAV cooperation, emphasizing both safety and efficiency. Our goal is to improve the generality and performance of CAVs in highly complex and crowded scenarios. Furthermore, the industrial-scale distributed training system easily adapts to different algorithms, reward functions, and strategies.

Paper Structure

This paper contains 19 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: The process where the model receives observations about a vehicle and its neighboring vehicles, denoted as $O$ for the vehicle's own observation and $N$ for the neighboring vehicles' observations, from the environment. Using a deep reinforcement learning model, represented as $NN$, it generates appropriate actions to control the vehicle's response and maneuvers. The model also involves a reward function $R$, which influences the actions based on predefined criteria. $NN$ outputs not only the control actions but also an estimation of future outcomes, represented as $E$.
  • Figure 2: Architecture of the distributed training system.
  • Figure 3: The blue vehicle is penalized for being too close to the red vehicle in front of it. However, surrounding green vehicles, due to either sufficient distance or not being directly ahead of the blue vehicle, do not trigger a penalty for the blue vehicle.