Table of Contents
Fetching ...

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving

Dianzhao Li, Ostap Okhrin

TL;DR

EthicAR introduces a two-level ethics-aware Safe RL framework for urban driving, integrating an ethics-based risk cost with collision probability and harm into a constrained reinforcement learning objective. It employs an LSTM-enhanced SAC with dynamic prioritized experience replay to learn from rare, high-risk events, and uses a two-stage collision risk model combining SAT overlap checks and Mahalanobis distance, yielding R_ au = P^ au H^ au and R_{traj} = \max_{ au} R_\tau. A hierarchical control stack translates high-level targets into smooth trajectories via a polynomial path planner and PID/Stanley followers, enabling practical, comfortable maneuvers. Across 75 unseen Waymo-derived scenarios, EthicAR achieves 25–45% reductions in conflict frequency relative to task-matched baselines while maintaining ego comfort, and demonstrates that ethically aware optimization can improve safety for VRUs without sacrificing performance.

Abstract

Autonomous vehicles hold great promise for reducing traffic fatalities and improving transportation efficiency, yet their widespread adoption hinges on embedding credible and transparent ethical reasoning into routine and emergency maneuvers, particularly to protect vulnerable road users (VRUs) such as pedestrians and cyclists. Here, we present a hierarchical Safe Reinforcement Learning (Safe RL) framework that augments standard driving objectives with ethics-aware cost signals. At the decision level, a Safe RL agent is trained using a composite ethical risk cost, combining collision probability and harm severity, to generate high-level motion targets. A dynamic, risk-sensitive Prioritized Experience Replay mechanism amplifies learning from rare but critical, high-risk events. At the execution level, polynomial path planning coupled with Proportional-Integral-Derivative (PID) and Stanley controllers translates these targets into smooth, feasible trajectories, ensuring both accuracy and comfort. We train and validate our approach on closed-loop simulation environments derived from large-scale, real-world traffic datasets encompassing diverse vehicles, cyclists, and pedestrians, and demonstrate that it outperforms baseline methods in reducing risk to others while maintaining ego performance and comfort. This work provides a reproducible benchmark for Safe RL with explicitly ethics-aware objectives in human-mixed traffic scenarios. Our results highlight the potential of combining formal control theory and data-driven learning to advance ethically accountable autonomy that explicitly protects those most at risk in urban traffic environments. Across two interactive benchmarks and five random seeds, our policy decreases conflict frequency by 25-45% compared to matched task successes while maintaining comfort metrics within 5%.

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving

TL;DR

EthicAR introduces a two-level ethics-aware Safe RL framework for urban driving, integrating an ethics-based risk cost with collision probability and harm into a constrained reinforcement learning objective. It employs an LSTM-enhanced SAC with dynamic prioritized experience replay to learn from rare, high-risk events, and uses a two-stage collision risk model combining SAT overlap checks and Mahalanobis distance, yielding R_ au = P^ au H^ au and R_{traj} = \max_{ au} R_\tau. A hierarchical control stack translates high-level targets into smooth trajectories via a polynomial path planner and PID/Stanley followers, enabling practical, comfortable maneuvers. Across 75 unseen Waymo-derived scenarios, EthicAR achieves 25–45% reductions in conflict frequency relative to task-matched baselines while maintaining ego comfort, and demonstrates that ethically aware optimization can improve safety for VRUs without sacrificing performance.

Abstract

Autonomous vehicles hold great promise for reducing traffic fatalities and improving transportation efficiency, yet their widespread adoption hinges on embedding credible and transparent ethical reasoning into routine and emergency maneuvers, particularly to protect vulnerable road users (VRUs) such as pedestrians and cyclists. Here, we present a hierarchical Safe Reinforcement Learning (Safe RL) framework that augments standard driving objectives with ethics-aware cost signals. At the decision level, a Safe RL agent is trained using a composite ethical risk cost, combining collision probability and harm severity, to generate high-level motion targets. A dynamic, risk-sensitive Prioritized Experience Replay mechanism amplifies learning from rare but critical, high-risk events. At the execution level, polynomial path planning coupled with Proportional-Integral-Derivative (PID) and Stanley controllers translates these targets into smooth, feasible trajectories, ensuring both accuracy and comfort. We train and validate our approach on closed-loop simulation environments derived from large-scale, real-world traffic datasets encompassing diverse vehicles, cyclists, and pedestrians, and demonstrate that it outperforms baseline methods in reducing risk to others while maintaining ego performance and comfort. This work provides a reproducible benchmark for Safe RL with explicitly ethics-aware objectives in human-mixed traffic scenarios. Our results highlight the potential of combining formal control theory and data-driven learning to advance ethically accountable autonomy that explicitly protects those most at risk in urban traffic environments. Across two interactive benchmarks and five random seeds, our policy decreases conflict frequency by 25-45% compared to matched task successes while maintaining comfort metrics within 5%.

Paper Structure

This paper contains 35 sections, 38 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: Overall diagram of the proposed data-driven EthicAR methodology. The hierarchical architecture comprises a decision level, where a Safe RL agent with an ethics‐aware cost function combining collision probability and harm severity, learns to output target motions. A dynamic PER emphasizes rare, high‐risk events during training. The execution level then converts these high‐level commands into smooth, feasible trajectories via a polynomial path planner and tracks them using PID/Stanley controllers, ensuring safe, comfortable maneuvers.
  • Figure 2: Training rewards and costs of different agents, together with the evolution of Lagrange multipliers in safe RL. Results are obtained with the cost mode set to ethical and the cost limit fixed at 1.
  • Figure 3: Constraint compliance of Safe RL agents across thresholds and modes. Compliance is shown for four agents (EthicAR, SACLag, EthicAR w/o PER, and LSTMSAC) across six cost thresholds ($\eta = 0.1\sim2$). Results are reported for all steps (right panels) and risky steps only (with a risk greater than 0), under two training modes: ethical (top) and selfish (bottom). EthicAR achieves the highest compliance, particularly in the suitable threshold range ($\eta = 0.6\sim1$), while SACLag learns partially, and EthicAR w/o PER fails to adapt to rare but critical violations. The LSTMSAC agent shows near-zero compliance across all settings. The selfish mode yields similar outcomes, confirming robustness across configurations.
  • Figure 4: Risk distributions of EthicAR and other agents evaluated under three different cost limits. The first row illustrates the ego vehicle risk, while the second row depicts the risk to other traffic participants.
  • Figure 5: Distributions of acceleration (top) and jerk (bottom) for EthicAR versus baseline agents under three cost limits.
  • ...and 5 more figures