DEM: A Method for Certifying Deep Neural Network Classifier Outputs in Aerospace
Guy Katz, Natan Levy, Idan Refaeli, Raz Yerushalmi
TL;DR
DEM introduces an output-centric method for certifying DNN predictions in safety-critical aerospace contexts by treating networks as black boxes and assessing the reliability of individual outputs via statistical perturbations in an $\epsilon$-neighborhood. Using PGCR-based concepts and hypothesis testing, it calibrates output-specific thresholds through offline data to enable recall- or precision-oriented operation, flagging unreliable predictions for expert review. The approach accommodates per-output variability, improves adversarial detection over state-of-the-art methods, and aligns with regulatory goals by enabling selective automation within a safety-analysis framework. Empirical results on CIFAR-10 with VGG16/ResNet demonstrate robust adversarial input detection and practical potential for certified deployment of DNNs in aerospace contexts, including planning toward certified co-pilots and enhanced FHA integration.
Abstract
Software development in the aerospace domain requires adhering to strict, high-quality standards. While there exist regulatory guidelines for commercial software in this domain (e.g., ARP-4754 and DO-178), these do not apply to software with deep neural network (DNN) components. Consequently, it is unclear how to allow aerospace systems to benefit from the deep learning revolution. Our work here seeks to address this challenge with a novel, output-centric approach for DNN certification. Our method employs statistical verification techniques, and has the key advantage of being able to flag specific inputs for which the DNN's output may be unreliable - so that they may be later inspected by a human expert. To achieve this, our method conducts a statistical analysis of the DNN's predictions for other, nearby inputs, in order to detect inconsistencies. This is in contrast to existing techniques, which typically attempt to certify the entire DNN, as opposed to individual outputs. Our method uses the DNN as a black-box, and makes no assumptions about its topology. We hope that this work constitutes another step towards integrating DNNs in safety-critical applications - especially in the aerospace domain, where high standards of quality and reliability are crucial.
