DEM: A Method for Certifying Deep Neural Network Classifier Outputs in Aerospace

Guy Katz; Natan Levy; Idan Refaeli; Raz Yerushalmi

DEM: A Method for Certifying Deep Neural Network Classifier Outputs in Aerospace

Guy Katz, Natan Levy, Idan Refaeli, Raz Yerushalmi

TL;DR

DEM introduces an output-centric method for certifying DNN predictions in safety-critical aerospace contexts by treating networks as black boxes and assessing the reliability of individual outputs via statistical perturbations in an $\epsilon$-neighborhood. Using PGCR-based concepts and hypothesis testing, it calibrates output-specific thresholds through offline data to enable recall- or precision-oriented operation, flagging unreliable predictions for expert review. The approach accommodates per-output variability, improves adversarial detection over state-of-the-art methods, and aligns with regulatory goals by enabling selective automation within a safety-analysis framework. Empirical results on CIFAR-10 with VGG16/ResNet demonstrate robust adversarial input detection and practical potential for certified deployment of DNNs in aerospace contexts, including planning toward certified co-pilots and enhanced FHA integration.

Abstract

Software development in the aerospace domain requires adhering to strict, high-quality standards. While there exist regulatory guidelines for commercial software in this domain (e.g., ARP-4754 and DO-178), these do not apply to software with deep neural network (DNN) components. Consequently, it is unclear how to allow aerospace systems to benefit from the deep learning revolution. Our work here seeks to address this challenge with a novel, output-centric approach for DNN certification. Our method employs statistical verification techniques, and has the key advantage of being able to flag specific inputs for which the DNN's output may be unreliable - so that they may be later inspected by a human expert. To achieve this, our method conducts a statistical analysis of the DNN's predictions for other, nearby inputs, in order to detect inconsistencies. This is in contrast to existing techniques, which typically attempt to certify the entire DNN, as opposed to individual outputs. Our method uses the DNN as a black-box, and makes no assumptions about its topology. We hope that this work constitutes another step towards integrating DNNs in safety-critical applications - especially in the aerospace domain, where high standards of quality and reliability are crucial.

DEM: A Method for Certifying Deep Neural Network Classifier Outputs in Aerospace

TL;DR

-neighborhood. Using PGCR-based concepts and hypothesis testing, it calibrates output-specific thresholds through offline data to enable recall- or precision-oriented operation, flagging unreliable predictions for expert review. The approach accommodates per-output variability, improves adversarial detection over state-of-the-art methods, and aligns with regulatory goals by enabling selective automation within a safety-analysis framework. Empirical results on CIFAR-10 with VGG16/ResNet demonstrate robust adversarial input detection and practical potential for certified deployment of DNNs in aerospace contexts, including planning toward certified co-pilots and enhanced FHA integration.

Abstract

Paper Structure (12 sections, 4 equations, 2 figures, 2 tables, 2 algorithms)

This paper contains 12 sections, 4 equations, 2 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Background
The Proposed Method
The Inference Phase
Dataset Preparation and Calibration
Maximal-Recall-Oriented Calibration
Maximal-Precision-Oriented Calibration
Evaluation
Evaluating the Recall-Oriented Calibration Algorithm
Evaluating the Precision-Oriented Calibration Algorithm
Discussion and Future Work

Figures (2)

Figure 1: An illustration of CIFAR-10 classifier performance. Each plot corresponds to a single output category. The $Y$-axes in the plots show the average number of hits (bold line) and standard deviation (faded area), for $k=1000$ perturbations around genuine (orange) and adversarial (blue) inputs; whereas the $X$-axes represent different values of $\epsilon$ (in percents). The goal is to have as small overlap as possible between the two distributions, observing the significance of $\epsilon$ and the variance of the distributions between different categories.
Figure 2: Different optimization goals: the recall-oriented algorithm seeks the optimal threshold value that maximizes the separation between genuine and adversarial instances. The precision-oriented algorithm strives to minimize the "yellow" area, by determining the lowest threshold that maximizes adversarial input detection and the highest threshold that maximizes genuine input detection, e.g., minimizing the "unknown" cases for both thresholds.

Theorems & Definitions (3)

Definition 3.1
Definition 3.2
Definition 3.3

DEM: A Method for Certifying Deep Neural Network Classifier Outputs in Aerospace

TL;DR

Abstract

DEM: A Method for Certifying Deep Neural Network Classifier Outputs in Aerospace

Authors

TL;DR

Abstract

Table of Contents

Figures (2)

Theorems & Definitions (3)