Evaluation of autonomous systems under data distribution shifts
Daniel Sikar, Artur Garcez
TL;DR
This work tackles safety under data distribution shifts for autonomous perception by introducing distance-based safety thresholds between training and testing data. It combines a Unity-based driving dataset (SDSandbox) with pixel-intensity RGB shifts and RGB→YUV preprocessing to quantify how shifts affect predictive accuracy, and evaluates multiple error and distribution-distance metrics. The study finds that in RGB space, simple histogram-based distances can robustly indicate safe operation with a practical threshold (e.g., Histogram Intersection around 0.40) and a clear safe-shift window near ±$40$ pixels; YUV-based distances tend to scale differently (exponential), complicating thresholds. The proposed P_safe rule and preference for fast RGB histogram metrics offer a practical, real-time mechanism to halt or hand control to humans when distribution shifts exceed a defined safety boundary, with implications for deploying autonomous systems in changing environments.
Abstract
We posit that data can only be safe to use up to a certain threshold of the data distribution shift, after which control must be relinquished by the autonomous system and operation halted or handed to a human operator. With the use of a computer vision toy example we demonstrate that network predictive accuracy is impacted by data distribution shifts and propose distance metrics between training and testing data to define safe operation limits within said shifts. We conclude that beyond an empirically obtained threshold of the data distribution shift, it is unreasonable to expect network predictive accuracy not to degrade
