Autonomous Drifting Based on Maximal Safety Probability Learning

Hikaru Hoshino; Jiaxing Li; Arnav Menon; John M. Dolan; Yorie Nakahira

Autonomous Drifting Based on Maximal Safety Probability Learning

Hikaru Hoshino, Jiaxing Li, Arnav Menon, John M. Dolan, Yorie Nakahira

TL;DR

It is shown that physics-informed reinforcement learning can efficiently learn this form of maximally safe policy, and can learn safe behaviors only from sparse binary rewards.

Abstract

This paper proposes a novel learning-based framework for autonomous driving based on the concept of maximal safety probability. Efficient learning requires rewards that are informative of desirable/undesirable states, but such rewards are challenging to design manually due to the difficulty of differentiating better states among many safe states. On the other hand, learning policies that maximize safety probability does not require laborious reward shaping but is numerically challenging because the algorithms must optimize policies based on binary rewards sparse in time. Here, we show that physics-informed reinforcement learning can efficiently learn this form of maximally safe policy. Unlike existing drift control methods, our approach does not require a specific reference trajectory or complex reward shaping, and can learn safe behaviors only from sparse binary rewards. This is enabled by the use of the physics loss that plays an analogous role to reward shaping. The effectiveness of the proposed approach is demonstrated through lane keeping in a normal cornering scenario and safe drifting in a high-speed racing scenario.

Autonomous Drifting Based on Maximal Safety Probability Learning

TL;DR

It is shown that physics-informed reinforcement learning can efficiently learn this form of maximally safe policy, and can learn safe behaviors only from sparse binary rewards.

Abstract

Paper Structure (10 sections, 2 theorems, 24 equations, 8 figures)

This paper contains 10 sections, 2 theorems, 24 equations, 8 figures.

INTRODUCTION
Notation
Maximal Safety Probability Learning
Problem Formulation
Physics-informed RL (PIRL)
Numerical Experiments
Training Methodology
Normal Cornering
Safe Drifting
Conclusions

Key Result

Proposition 1

Consider the system eq:augmented_dynamics starting from an initial state $s = [\tau, x^\top]^\top \in \mathcal{S}$ and the reward function $r: \mathcal{S} \to \mathbb{R}$ given by with $\mathcal{G} := [0, \Delta t)$. Then, for a given control policy $\pi$, the value function $v^{\pi}$ defined by where $N_\mathrm{f} := \inf \{ j \in \mathbb{Z}_+ \,|\, S_j \in \mathcal{S}_\mathrm{abs} \}$, takes a

Figures (8)

Figure 1: Framework of training by Physics-informed reinforcement learning (PIRL)
Figure 2: Lane keeping with normal cornering
Figure 3: Training progress for normal cornering task
Figure 4: Effect of $e$ and $\psi$
Figure 5: Effect of $e$ and $v_x$
...and 3 more figures

Theorems & Definitions (4)

Proposition 1
proof
Theorem 1: Hoshino2024:ACC
Remark 1

Autonomous Drifting Based on Maximal Safety Probability Learning

TL;DR

Abstract

Autonomous Drifting Based on Maximal Safety Probability Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (4)