Table of Contents
Fetching ...

ROMAN: Reward-Orchestrated Multi-Head Attention Network for Autonomous Driving System Testing

Jianlei Chi, Yuzhen Wu, Jiaxuan Hou, Xiaodong Zhang, Ming Fan, Suhui Sun, Weijun Dai, Bo Li, Jianguo Sun, Jun Sun

TL;DR

ROMAN tackles the challenge of thoroughly testing autonomous driving systems by generating high-risk, law-violating scenarios. It combines a reward-orchestrated pipeline with a multi-head attention backbone to model multi-vehicle interactions and an LLM-based risk weighting to prioritize severe violations, all validated in CARLA with Baidu Apollo. Key contributions include the scenario-encoding framework, the STL-grounded traffic-law weighting, the proxy-trained generation model, and the integration of RTAMT-based scenario testing, which collectively yield higher law coverage and greater scenario diversity than state-of-the-art baselines. The approach is demonstrated to scale across international traffic laws, with robust weightings validated across multiple LLM backends, and a fast reward-approximation proxy that dramatically reduces simulator costs. These results suggest ROMAN can meaningfully improve ADS robustness and safety testing, and the authors provide open data and code to enable broad adoption.

Abstract

Automated Driving System (ADS) acts as the brain of autonomous vehicles, responsible for their safety and efficiency. Safe deployment requires thorough testing in diverse real-world scenarios and compliance with traffic laws like speed limits, signal obedience, and right-of-way rules. Violations like running red lights or speeding pose severe safety risks. However, current testing approaches face significant challenges: limited ability to generate complex and high-risk law-breaking scenarios, and failing to account for complex interactions involving multiple vehicles and critical situations. To address these challenges, we propose ROMAN, a novel scenario generation approach for ADS testing that combines a multi-head attention network with a traffic law weighting mechanism. ROMAN is designed to generate high-risk violation scenarios to enable more thorough and targeted ADS evaluation. The multi-head attention mechanism models interactions among vehicles, traffic signals, and other factors. The traffic law weighting mechanism implements a workflow that leverages an LLM-based risk weighting module to evaluate violations based on the two dimensions of severity and occurrence. We have evaluated ROMAN by testing the Baidu Apollo ADS within the CARLA simulation platform and conducting extensive experiments to measure its performance. Experimental results demonstrate that ROMAN surpassed state-of-the-art tools ABLE and LawBreaker by achieving 7.91% higher average violation count than ABLE and 55.96% higher than LawBreaker, while also maintaining greater scenario diversity. In addition, only ROMAN successfully generated violation scenarios for every clause of the input traffic laws, enabling it to identify more high-risk violations than existing approaches.

ROMAN: Reward-Orchestrated Multi-Head Attention Network for Autonomous Driving System Testing

TL;DR

ROMAN tackles the challenge of thoroughly testing autonomous driving systems by generating high-risk, law-violating scenarios. It combines a reward-orchestrated pipeline with a multi-head attention backbone to model multi-vehicle interactions and an LLM-based risk weighting to prioritize severe violations, all validated in CARLA with Baidu Apollo. Key contributions include the scenario-encoding framework, the STL-grounded traffic-law weighting, the proxy-trained generation model, and the integration of RTAMT-based scenario testing, which collectively yield higher law coverage and greater scenario diversity than state-of-the-art baselines. The approach is demonstrated to scale across international traffic laws, with robust weightings validated across multiple LLM backends, and a fast reward-approximation proxy that dramatically reduces simulator costs. These results suggest ROMAN can meaningfully improve ADS robustness and safety testing, and the authors provide open data and code to enable broad adoption.

Abstract

Automated Driving System (ADS) acts as the brain of autonomous vehicles, responsible for their safety and efficiency. Safe deployment requires thorough testing in diverse real-world scenarios and compliance with traffic laws like speed limits, signal obedience, and right-of-way rules. Violations like running red lights or speeding pose severe safety risks. However, current testing approaches face significant challenges: limited ability to generate complex and high-risk law-breaking scenarios, and failing to account for complex interactions involving multiple vehicles and critical situations. To address these challenges, we propose ROMAN, a novel scenario generation approach for ADS testing that combines a multi-head attention network with a traffic law weighting mechanism. ROMAN is designed to generate high-risk violation scenarios to enable more thorough and targeted ADS evaluation. The multi-head attention mechanism models interactions among vehicles, traffic signals, and other factors. The traffic law weighting mechanism implements a workflow that leverages an LLM-based risk weighting module to evaluate violations based on the two dimensions of severity and occurrence. We have evaluated ROMAN by testing the Baidu Apollo ADS within the CARLA simulation platform and conducting extensive experiments to measure its performance. Experimental results demonstrate that ROMAN surpassed state-of-the-art tools ABLE and LawBreaker by achieving 7.91% higher average violation count than ABLE and 55.96% higher than LawBreaker, while also maintaining greater scenario diversity. In addition, only ROMAN successfully generated violation scenarios for every clause of the input traffic laws, enabling it to identify more high-risk violations than existing approaches.
Paper Structure (18 sections, 8 equations, 6 figures, 7 tables, 3 algorithms)

This paper contains 18 sections, 8 equations, 6 figures, 7 tables, 3 algorithms.

Figures (6)

  • Figure 1: Overtaking Violation Scenarios Leading to Collisions
  • Figure 2: The Overview of ROMAN
  • Figure 3: Automated Workflow for LLM-based Risk Weighting
  • Figure 4: Comparison of Violation Counts in Scenarios S1 to S4
  • Figure 5: Scenario Diversity Comparison Using DTW Distance Across S1-S4
  • ...and 1 more figures