Table of Contents
Fetching ...

Interacting safely with cyclists using Hamilton-Jacobi reachability and reinforcement learning

Aarati Andrea Noronha, Jean Oh

TL;DR

The approach integrates Hamilton-Jacobi reachability analysis with deep Q-learning to jointly address safety guarantees and time-efficient navigation and incorporates the cyclist's latent response to the vehicle as a structured reward signal within a reinforcement learning framework.

Abstract

In this paper, we present a framework for enabling autonomous vehicles to interact with cyclists in a manner that balances safety and optimality. The approach integrates Hamilton-Jacobi reachability analysis with deep Q-learning to jointly address safety guarantees and time-efficient navigation. A value function is computed as the solution to a time-dependent Hamilton-Jacobi-Bellman inequality, providing a quantitative measure of safety for each system state. This safety metric is incorporated as a structured reward signal within a reinforcement learning framework. The method further models the cyclist's latent response to the vehicle, allowing disturbance inputs to reflect human comfort and behavioral adaptation. The proposed framework is evaluated through simulation and comparison with human driving behavior and an existing state-of-the-art method.

Interacting safely with cyclists using Hamilton-Jacobi reachability and reinforcement learning

TL;DR

The approach integrates Hamilton-Jacobi reachability analysis with deep Q-learning to jointly address safety guarantees and time-efficient navigation and incorporates the cyclist's latent response to the vehicle as a structured reward signal within a reinforcement learning framework.

Abstract

In this paper, we present a framework for enabling autonomous vehicles to interact with cyclists in a manner that balances safety and optimality. The approach integrates Hamilton-Jacobi reachability analysis with deep Q-learning to jointly address safety guarantees and time-efficient navigation. A value function is computed as the solution to a time-dependent Hamilton-Jacobi-Bellman inequality, providing a quantitative measure of safety for each system state. This safety metric is incorporated as a structured reward signal within a reinforcement learning framework. The method further models the cyclist's latent response to the vehicle, allowing disturbance inputs to reflect human comfort and behavioral adaptation. The proposed framework is evaluated through simulation and comparison with human driving behavior and an existing state-of-the-art method.
Paper Structure (20 sections, 15 equations, 2 figures, 1 table)

This paper contains 20 sections, 15 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: The value function has been plotted on the left side.For the sake of visualization, lateral range has been restricted to a value of 1.5m. On the right side is the zero sub-level set of the viscosity solution to the Hamilton-Jacobi Bellman partial differential equation. In each of the annotated boxes, $X$ represents the longitudinal range, $Y$ represents the longitudinal range rate and $Level$ denotes the negated value function. The aqua green portion represents the backward reachable set (a) Value function computed using our algorithm (b) Value function computed using the algorithm deployed by Fisac et al. 2019
  • Figure 2: A visualization of a cyclist event from the test set. A red line denotes the trajectory computed using the level set deployed by Fisac et al. A dark blue line denotes the ground truth i.e. the behavior of human drivers. A cyan blue line indicates the trajectory traversed by our framework. A green line denotes the trajectory of the cyclist. The position of each of these agents at a particular time instant is given by a dot on its trajectory in its corresponding color. The start and goal positions are denoted by a black star and black circle respectively