Table of Contents
Fetching ...

Perceived risk evolution in automated driving inferred from large-scale discrete ratings

Xiaolin He, Zirui Li, Xinwei Wang, Riender Happee, Meng Wang

Abstract

Perceived risk in automated driving is often measured as discrete scores that summarise riding experience but this obscures volatile peaks from sustained elevation. Here we treat discrete clipwise ratings as constraints on an unobserved inferred evolution and apply a kernel constrained inverse model to infer the temporal evolution of perceived risk. Across 2,164 participants and 141,628 discrete clipwise ratings spanning 236 hours of scripted motorway interactions, we infer evolutions under kernel constraints whose shapes follow priors from independent handset-based ratings and whose timing is fixed by scripted manoeuvre markers. The inferred perceived risk evolutions differentiate accumulated perceived risk from within clip concentration, revealing scenario differences that are not identifiable from peak judgements alone. We then map these inferred evolutions from observable vehicle and relative motion cues under strict event level holdout using a deep neural network, enabling interpretable attribution analyses. Attribution shows distinct patterns between risk rising and falling segments, with a shift toward conflict cues in the rising phase, and a rebound toward stability cues in the falling phase. Attribution concentration increases only modestly at high perceived risk levels. These results move beyond treating perceived risk as a single severity score by characterising within episode dynamics and phase dependent cue associations in scripted motorway interactions.

Perceived risk evolution in automated driving inferred from large-scale discrete ratings

Abstract

Perceived risk in automated driving is often measured as discrete scores that summarise riding experience but this obscures volatile peaks from sustained elevation. Here we treat discrete clipwise ratings as constraints on an unobserved inferred evolution and apply a kernel constrained inverse model to infer the temporal evolution of perceived risk. Across 2,164 participants and 141,628 discrete clipwise ratings spanning 236 hours of scripted motorway interactions, we infer evolutions under kernel constraints whose shapes follow priors from independent handset-based ratings and whose timing is fixed by scripted manoeuvre markers. The inferred perceived risk evolutions differentiate accumulated perceived risk from within clip concentration, revealing scenario differences that are not identifiable from peak judgements alone. We then map these inferred evolutions from observable vehicle and relative motion cues under strict event level holdout using a deep neural network, enabling interpretable attribution analyses. Attribution shows distinct patterns between risk rising and falling segments, with a shift toward conflict cues in the rising phase, and a rebound toward stability cues in the falling phase. Attribution concentration increases only modestly at high perceived risk levels. These results move beyond treating perceived risk as a single severity score by characterising within episode dynamics and phase dependent cue associations in scripted motorway interactions.

Paper Structure

This paper contains 87 sections, 57 equations, 23 figures, 16 tables.

Figures (23)

  • Figure 1: Schematic overview of the study motivation, the full workflow from data collection to inference, and the decoding and attribution analyses. a, Automated driving as a safety critical model system for human assessments in dynamic interactions. Safety relevant judgements must be formed under stringent control demands and rapidly changing relative motion cues, yet retrospective summaries can assign similar overall judgements to episodes that differ in within episode dynamics, such as a brief high peak versus sustained elevation. Perceived risk is used as an interpretable user side appraisal with measurement utility, computational relevance, and links to behaviour regulation. b, End to end workflow from scripted motorway scenarios to inferred evolution. Scripted events are rendered as videos, segmented into sequential clips, and rated online with instructions to reflect the most dangerous moment within each clip. The resulting clip ratings are treated as constraints on an unobserved event level inferred evolution $r(t)$, expressed as a superposition of response kernels whose locations are anchored to pre specified manoeuvre timing markers and whose shapes are grounded by an independent driving simulator study with handset based continuous ratings collected throughout events. c, Decoding the inferred evolution from kinematic cues and testing cue utilisation through attribution analysis. A neural network predicts $\hat{r}(t)$ from kinematic cues under strict event level holdout, and attribution to the validated predictor is summarised by inferred intensity and by local direction of change, rising, stable, and falling. This analysis tests whether cue utilisation differs across periods and whether attribution concentration shows mild focussing at higher inferred risk. Colours indicate cue families, orange denotes conflict related cues and green denotes stability related cues.
  • Figure 2: Kernel-constrained inference of perceived risk evolution with validation against time-continuous perceived risk ratings and reference psychological measurements in an independent simulator experiment. a, Representative online study events from each scenario show inferred evolution of perceived risk (solid black) with uncertainty intervals (grey shading). Events were selected algorithmically to maximise typicality (minimising deviation from the scenario group mean) and inference error (minimising error against discrete ratings), ensuring the plots illustrate characteristic dynamics (see SI Appendix, Section 2B.6 for selection criteria). Horizontal dashed segments indicate the clipwise discrete ratings, and vertical dashed lines mark pre-specified manoeuvre timing markers from the scripted events. Vig marks the scripted interaction onset (neighbouring vehicle appearance or approach), LC marks lane change onset, and Brk marks braking onset of the neighbouring vehicle. b, Validation against independent driving simulator data for the merging with hard braking scenario. For held-out simulator events (nine events E11-E19), discrete constraints were constructed to match the online protocol by taking the within clip maxima of the 10Hz time-continuous ratings in three six-second clips, together with boundary constraints. The perceived risk evolution was then inferred under the same kernel constraints and simulator informed priors. Example panels show the collected time-continuous ratings alongside the inferred evolutions and uncertainty intervals. The scatter plots summarise event wise agreement metrics across the nine held-out events, including RMSE, rank correlation, and uncertainty coverage, with mean RMSE 0.29, mean Spearman correlation 0.72, and mean coverage 92.1%. Results for all nine simulator events are in SI Appendix, Section 2C. c, Correspondence with reference measurements in the simulator study. For the same held-out events, peak inferred perceived risk shows a clearer association with peak pupil response than with peak heart rate, which is also more variable across events, despite using the same fixed time window aligned to brake onset. Event-level temporal correspondence is quantified using cross-correlations between each physiological signal and the inferred evolution across time lags. We report, for each signal, the lag that maximises correlation (negative indicates physiology leading the inferred risk) and the corresponding maximal correlation coefficient. Across events, pupil dynamics tend to precede the inferred risk (median lag $=-0.40\,\mathrm{s}$), whereas heart rate tends to follow (median lag $=+2.50\,\mathrm{s}$).
  • Figure 3: Why peak judgements are insufficient and why inferred evolution is needed. a, Two clips can exhibit similar peak perceived risk $P$ while cumulative perceived risk $A$ and time-averaged perceived risk $E$ differ, showing that peak summaries can miss sustained elevation within the same clip duration. b, The mapping from the discrete clip rating $y_k$ to time-averaged perceived risk $E_k$ depends on interaction type. Clips were binned by $y_k$ and $E_k$ was summarised within each bin for each scenario. Points denote bin medians and vertical bars denote interquartile ranges. c, Temporal concentration $F_k$ differs across interaction types. Violin plots summarise $F_k$ for clips with $F_k<1$, with horizontal bars indicating medians. The fraction of clips at the boundary value $F_k=1$ is reported separately for each scenario. Global and pairwise statistics are reported in the main text, with full post hoc results reported in SI Appendix, Table S5. Together, a--c show that discrete ratings and peak summaries do not capture differences in time-averaged perceived risk and temporal concentration, motivating inference of a latent evolution of perceived risk from which complementary clip level summaries of $P$, $A$, $E$, and $F$ can be derived.
  • Figure 4: Kinematics to inferred evolution mapping holds under event holdout. a, Unified scenario decoding under strict event level four fold cross validation. Top row shows representative held out events from HB, MB, LC, and SVM comparing the inferred perceived risk evolution (black) with predictions from the unified DNN (orange dashed, mean across independent initialisations) and physics based baselines PCAD (green dash dot) and DRF (red dotted). The DNN follows both the onset and recovery profile, whereas the baselines frequently show abrupt spikes or misaligned decays. b, summarises calibration within each scenario using binned conditional means, where the horizontal axis is the inferred risk mean within a bin and the vertical axis is the predicted mean within the same bin. Shaded points denote the density of time samples, the dashed diagonal indicates the identity relation, and overlaid curves show bin means for each model. Marginal histograms show the distribution of binned inferred and predicted means. c, Cross-scenario stress test. The row shows quantile binned conditional means for the held out scenario after adding one event or ten events from the target scenario to the training set, with the dashed diagonal indicating the identity relation. d, Few-shot adaptation. The row shows raincloud plots of event RMSE as the number of few shot events added increases from one to ten, combining a half violin density, jittered event points, and a box plot summary. Box plots indicate the median and interquartile range, and whiskers extend to 1.5 times the interquartile range.
  • Figure 5: State dependent cue reweighting with a physical anchoring check. a, Model physical anchoring and state dependence for the same ordered feature set. The beeswarm shows signed Shapley values for the top features, with points coloured by feature value. b, The state heat map reports mean absolute Shapley values within each of the nine risk states using the identical feature ordering. c, The table provides descriptions for the same ordered features. All signed kinematic variables are expressed in a vehicle fixed frame, with longitudinal positive forward and lateral positive left. Variables carrying $\mathrm{mean}$ denote a local average computed within a 50 time step window (5 s), whereas variables without $\mathrm{mean}$ denote the instantaneous value at the same time point. An asterisk (*) marks the PCAD manoeuvre uncertainty term, which provides a conservative anticipatory surrogate beyond instantaneous kinematics (Definitions of all features are detailed in SI Appendix, Section 4D). d, Cue family allocation across nine risk states. For each event, absolute Shapley values were normalised to sum to one, then summed within each cue family to yield a cue family share, and finally aggregated across events. Risk states are defined by inferred risk intensity (low, medium, high) combined with local phase (falling, stable, rising). Bars show the mean across events and error bars indicate 95% bootstrap intervals with event resampling. e, Paired within event contrasts. The left distribution shows the within event difference in conflict cue share between rising and stable segments within the same intensity tier, and the right distribution shows the corresponding difference in stability cue share between falling and rising segments. Points denote events and violin envelopes show empirical distributions; horizontal bars indicate medians. f, Modest increase in attribution concentration with inferred risk intensity. At each time point, absolute Shapley values were normalised to sum to one across features, and the top one share was defined as the largest resulting feature share. The mean top one share was summarised as a function of inferred risk using two binning schemes, quantile bins and equal width bins (legend). Within each bin and for each scheme, we used matched event count resampling to aggregate the same number of events per bin and to obtain uncertainty intervals. Lines show bin means and shaded regions indicate 95% intervals.
  • ...and 18 more figures