Multi-Objective Reinforcement Learning for Critical Scenario Generation of Autonomous Vehicles
Jiahui Wu, Chengjie Lu, Aitor Arrieta, Shaukat Ali
TL;DR
Problem: autonomous-vehicle dependability requires generating scenarios that jointly violate interdependent safety and functional requirements. Approach: MOEQT applies Envelope Q-learning to a multi-objective Markov decision process ($MOMDP$) that adaptively adjusts objective weights to configure testing environments and induce violations across multiple requirements. Contributions: formalizes AV testing as a MOMDP with a vector reward $\m{R}(t)$ derived from per-step objective measures, employs six CARLA roads with the Interfuser controller, and demonstrates superior performance against a random strategy and a fixed-weight single-objective RL in generating multi-objective violations. Significance: provides principled, adaptive scenario generation for comprehensive AV testing in dynamic environments, improving the ability to assess dependability and safety margins.
Abstract
Autonomous vehicles (AVs) make driving decisions without human intervention. Therefore, ensuring AVs' dependability is critical. Despite significant research and development in AV development, their dependability assurance remains a significant challenge due to the complexity and unpredictability of their operating environments. Scenario-based testing evaluates AVs under various driving scenarios, but the unlimited number of potential scenarios highlights the importance of identifying critical scenarios that can violate safety or functional requirements. Such requirements are inherently interdependent and need to be tested simultaneously. To this end, we propose MOEQT, a novel multi-objective reinforcement learning (MORL)-based approach to generate critical scenarios that simultaneously test interdependent safety and functional requirements. MOEQT adapts Envelope Q-learning as the MORL algorithm, which dynamically adapts multi-objective weights to balance the relative importance between multiple objectives. MOEQT generates critical scenarios to violate multiple requirements through dynamically interacting with the AV environment, ensuring comprehensive AV testing. We evaluate MOEQT using an advanced end-to-end AV controller and a high-fidelity simulator and compare MOEQT with two baselines: a random strategy and a single-objective RL with a weighted reward function. Our evaluation results show that MOEQT achieved an overall better performance in identifying critical scenarios for violating multiple requirements than the baselines.
