Table of Contents
Fetching ...

Does Unpredictability Influence Driving Behavior?

Sepehr Samavi, Florian Shkurti, Angela P. Schoellig

TL;DR

The paper addresses how surrounding driver unpredictability influences ego-vehicle lane-change behavior. It introduces an unpredictability metric derived from the error of a trajectory predictor and embeds it as an additional feature in a Maximum Entropy Inverse Reinforcement Learning framework to learn two lane-change reward functions (baseline and unpredictability-aware). Evaluations on datasets including US-101, I-80, and highD show the unpredictability-aware rewards produce trajectories that better fit human data, with an average MEE improvement of about 5.9% on test sets and qualitatively more cautious maneuvers when adjacent cars are unpredictable. The work suggests unpredictability is a valuable signal for human-aligned planning and paves the way for exploring alternate predictors and nonlinear reward structures in driving policy design.

Abstract

In this paper we investigate the effect of the unpredictability of surrounding cars on an ego-car performing a driving maneuver. We use Maximum Entropy Inverse Reinforcement Learning to model reward functions for an ego-car conducting a lane change in a highway setting. We define a new feature based on the unpredictability of surrounding cars and use it in the reward function. We learn two reward functions from human data: a baseline and one that incorporates our defined unpredictability feature, then compare their performance with a quantitative and qualitative evaluation. Our evaluation demonstrates that incorporating the unpredictability feature leads to a better fit of human-generated test data. These results encourage further investigation of the effect of unpredictability on driving behavior.

Does Unpredictability Influence Driving Behavior?

TL;DR

The paper addresses how surrounding driver unpredictability influences ego-vehicle lane-change behavior. It introduces an unpredictability metric derived from the error of a trajectory predictor and embeds it as an additional feature in a Maximum Entropy Inverse Reinforcement Learning framework to learn two lane-change reward functions (baseline and unpredictability-aware). Evaluations on datasets including US-101, I-80, and highD show the unpredictability-aware rewards produce trajectories that better fit human data, with an average MEE improvement of about 5.9% on test sets and qualitatively more cautious maneuvers when adjacent cars are unpredictable. The work suggests unpredictability is a valuable signal for human-aligned planning and paves the way for exploring alternate predictors and nonlinear reward structures in driving policy design.

Abstract

In this paper we investigate the effect of the unpredictability of surrounding cars on an ego-car performing a driving maneuver. We use Maximum Entropy Inverse Reinforcement Learning to model reward functions for an ego-car conducting a lane change in a highway setting. We define a new feature based on the unpredictability of surrounding cars and use it in the reward function. We learn two reward functions from human data: a baseline and one that incorporates our defined unpredictability feature, then compare their performance with a quantitative and qualitative evaluation. Our evaluation demonstrates that incorporating the unpredictability feature leads to a better fit of human-generated test data. These results encourage further investigation of the effect of unpredictability on driving behavior.
Paper Structure (12 sections, 7 equations, 6 figures, 2 tables)

This paper contains 12 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: We compare a snapshot (at $t=3s$) of ego car lane change trajectories generated using the baseline (red) and the unpredictability-aware (green) models, for identical adjacent cars (gray and orange) and lanes. The history of each vehicle is illustrated with decreasing opacity. The erratic adjacent vehicle is perturbed to zigzag. We observe that the trajectory from the unpredictability-aware model (green) delays changing into the target lane and maintains a higher distance from the erratic car (orange) than the baseline (red). Video at https://tiny.cc/unpredi.
  • Figure 2: State values compared to time. We compare the human ego-car lane change trajectory (blue) with trajectories generated by optimizing rewards with baseline reward, $\mathbold{\theta}_{w}$ (red), and unpredictability-aware reward, $\mathbold{\theta}_{w+}$ (green) on the US-101 $t1$ test dataset. We plot the average (solid lines) and 3-$\sigma$ bounds (translucent fill) of the values. In the lateral direction (a) and heading (c), we observe more similarity between the $\mathbold{\theta}_{w+}$ model (green) and the human data (blue) compared to the $\mathbold{\theta}_{w}$ model (red).
  • Figure 3: Snapshot at $t=5.5s$ for baseline (red) unpredictability-aware (green) trajectories, compared to the human in the test set (blue). In this scenario none of the adjacent vehicles are unpredictable. The unpredictability-aware model (green) is very similar the baseline (red).
  • Figure 4: Snapshot at $t=3.2s$. In this scenario, the adjacent vehicle preceding the ego car is behaving unpredictably by changing lanes in front of the ego car, who is also changing lanes. The unpredictability-aware model (green) delays changing into the target lane and maintains a higher distance from the preceding vehicle than the baseline (red), similar to the human (blue).
  • Figure 5: Snapshot at $t=0.5s$. In this scenario, the adjacent vehicle in the target lane speeds up to not allow the ego vehicle in the target lane. The unpredictability-aware model (green) delays entering the target lane, similar to the human (blue), while the baseline model (red) cuts in front of the adjacent car.
  • ...and 1 more figures