Table of Contents
Fetching ...

System Safety Monitoring of Learned Components Using Temporal Metric Forecasting

Sepehr Sharifi, Andrea Stocco, Lionel C. Briand

TL;DR

This work addresses runtime safety monitoring for learned components in safety-critical autonomous systems by forecasting the future safety metric distribution using DL-based probabilistic time-series forecasters. It formulates the problem to predict hazard-relevant safety metrics from historical outputs and system context, and evaluates four state-of-the-art models on ACT and ADS case studies with a focus on forecast accuracy and latency. Across extensive experiments, Temporal Fusion Transformer (TFT) consistently provides the most accurate imminent-violation predictions with acceptable latency and memory, while DeepAR offers strengths at specific quantiles or with large horizons; results are reinforced by rigorous statistical analysis. The findings support deploying TFT-based safety monitors in resource-constrained environments and highlight practical guidelines for horizon settings and data requirements, with implications for broader domains and future refinements using regression-based scenario analysis and expanded SITL datasets.

Abstract

In learning-enabled autonomous systems, safety monitoring of learned components is crucial to ensure their outputs do not lead to system safety violations, given the operational context of the system. However, developing a safety monitor for practical deployment in real-world applications is challenging. This is due to limited access to internal workings and training data of the learned component. Furthermore, safety monitors should predict safety violations with low latency, while consuming a reasonable amount of computation. To address the challenges, we propose a safety monitoring method based on probabilistic time series forecasting. Given the learned component outputs and an operational context, we empirically investigate different Deep Learning (DL)-based probabilistic forecasting to predict the objective measure capturing the satisfaction or violation of a safety requirement (safety metric). We empirically evaluate safety metric and violation prediction accuracy, and inference latency and resource usage of four state-of-the-art models, with varying horizons, using autonomous aviation and autonomous driving case studies. Our results suggest that probabilistic forecasting of safety metrics, given learned component outputs and scenarios, is effective for safety monitoring. Furthermore, for both case studies, Temporal Fusion Transformer (TFT) was the most accurate model for predicting imminent safety violations, with acceptable latency and resource consumption.

System Safety Monitoring of Learned Components Using Temporal Metric Forecasting

TL;DR

This work addresses runtime safety monitoring for learned components in safety-critical autonomous systems by forecasting the future safety metric distribution using DL-based probabilistic time-series forecasters. It formulates the problem to predict hazard-relevant safety metrics from historical outputs and system context, and evaluates four state-of-the-art models on ACT and ADS case studies with a focus on forecast accuracy and latency. Across extensive experiments, Temporal Fusion Transformer (TFT) consistently provides the most accurate imminent-violation predictions with acceptable latency and memory, while DeepAR offers strengths at specific quantiles or with large horizons; results are reinforced by rigorous statistical analysis. The findings support deploying TFT-based safety monitors in resource-constrained environments and highlight practical guidelines for horizon settings and data requirements, with implications for broader domains and future refinements using regression-based scenario analysis and expanded SITL datasets.

Abstract

In learning-enabled autonomous systems, safety monitoring of learned components is crucial to ensure their outputs do not lead to system safety violations, given the operational context of the system. However, developing a safety monitor for practical deployment in real-world applications is challenging. This is due to limited access to internal workings and training data of the learned component. Furthermore, safety monitors should predict safety violations with low latency, while consuming a reasonable amount of computation. To address the challenges, we propose a safety monitoring method based on probabilistic time series forecasting. Given the learned component outputs and an operational context, we empirically investigate different Deep Learning (DL)-based probabilistic forecasting to predict the objective measure capturing the satisfaction or violation of a safety requirement (safety metric). We empirically evaluate safety metric and violation prediction accuracy, and inference latency and resource usage of four state-of-the-art models, with varying horizons, using autonomous aviation and autonomous driving case studies. Our results suggest that probabilistic forecasting of safety metrics, given learned component outputs and scenarios, is effective for safety monitoring. Furthermore, for both case studies, Temporal Fusion Transformer (TFT) was the most accurate model for predicting imminent safety violations, with acceptable latency and resource consumption.
Paper Structure (59 sections, 5 equations, 9 figures, 6 tables)

This paper contains 59 sections, 5 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: The overall process for training the temporal safety metric forecaster for safety monitoring.
  • Figure 2: System-in-the-loop simulation of the ACT system and the X-Plane simulator (a), and the ADS system and the Udacity simulator (b).
  • Figure 3: Average q-Risk values and their corresponding 95% confidence interval (CI0.95) for all models over all reported quantiles, for the $cte$ and $he$ safety requirements of the autonomous taxiing (ACT) case study and the $cte$ safety requirement of the autonomous driving (ADS) case study, respectively. Note that the x-axis is not drawn to scale, in favor of a more readable presentation.
  • Figure 4: False Negatives (FN) and False Positives (FP) at different prediction quantiles (q), for $cte$ and $he$ safety requirements of the ACT case study and the $cte$ safety requirement of the ADS case study, respectively. Note that the x-axis is not drawn to scale, in favor of a more readable presentation.
  • Figure 5: Safety metric prediction accuracy metrics for various window configurations of DeepAR, MQCNN, Seq2Seq and TFT, for the $cte$ and $he$ safety requirements, respectively.
  • ...and 4 more figures