Using Surprise Index for Competency Assessment in Autonomous Decision-Making

Akash Ratheesh; Ofer Dagan; Nisar R. Ahmed; Jay McMahon

Using Surprise Index for Competency Assessment in Autonomous Decision-Making

Akash Ratheesh, Ofer Dagan, Nisar R. Ahmed, Jay McMahon

TL;DR

The paper addresses evaluating competency of autonomous decision-making under uncertainty by introducing a Surprise Index (SI) that quantifies how surprising observed evidence is under a probabilistic model. It derives a closed-form SI for joint Gaussian evidence via the Mahalanobis distance, connects it to a chi-square limit, and extends to dynamic Gauss-Markov filtering with sigma-point uncertainty propagation. The authors validate the approach on a 2-D GPS localization task and apply it to a nonlinear spacecraft maneuver problem with an RL policy, showing SI tracks expected behavior under nominal conditions and flags deviations with lower SI, while discussing computational considerations. The work provides a probabilistic, interpretable complement to NEES/NIS tests for model validity and RL policy evaluation, with potential utility for tuning filters and planners in dynamic, uncertain environments.

Abstract

This paper considers the problem of evaluating an autonomous system's competency in performing a task, particularly when working in dynamic and uncertain environments. The inherent opacity of machine learning models, from the perspective of the user, often described as a `black box', poses a challenge. To overcome this, we propose using a measure called the Surprise index, which leverages available measurement data to quantify whether the dynamic system performs as expected. We show that the surprise index can be computed in closed form for dynamic systems when observed evidence in a probabilistic model if the joint distribution for that evidence follows a multivariate Gaussian marginal distribution. We then apply it to a nonlinear spacecraft maneuver problem, where actions are chosen by a reinforcement learning agent and show it can indicate how well the trajectory follows the required orbit.

Using Surprise Index for Competency Assessment in Autonomous Decision-Making

TL;DR

Abstract

Paper Structure (10 sections, 22 equations, 5 figures)

This paper contains 10 sections, 22 equations, 5 figures.

Introduction
Background
Approach and Methodology
Validation Example - GPS Localization
Generalization to Dynamic Filtering Problems
Example Problem - Spacecraft Maneuver
Uncertainty Propagation via Sigma Points
Test Case 1 - Nominal Conditions
Test Case 2 - RL Policy Evaluation
Summary

Figures (5)

Figure 1: Ordered probabilistic equivalence sets for Gaussian pdf with $N=2$.
Figure 2: Empirical grid approximation of SI according to zagorecki_approximation_2015 (left) and the new analytical SI (right).
Figure 3: (a) Reference orbit, with red indicating the segment of the orbit that the SI was calculated over. (b) SI results for different true sensor uncertainty values relative to the assumed sensor model.
Figure 4: Trajectories (upper row) and SI evaluation (lower row) for stationkeeping scenario with different initial uncertainty. (a) Initial uncertainty is equivalent to what the policy was trained on $P_0$. (b) and (c): $3$ and $5$ times the trained initial uncertainty, respectively.
Figure 5: Trajectory comparison -- comparing good performance (green trajectory) to a deviated trajectory (red). (a) shows the trajectories vs. the reference (black) stationkeeping orbit. (b) shows the corresponding SI.

Using Surprise Index for Competency Assessment in Autonomous Decision-Making

TL;DR

Abstract

Using Surprise Index for Competency Assessment in Autonomous Decision-Making

Authors

TL;DR

Abstract

Table of Contents

Figures (5)