Table of Contents
Fetching ...

Can we Defend Against the Unknown? An Empirical Study About Threshold Selection for Neural Network Monitoring

Khoi Tran Dang, Kevin Delmas, Jérémie Guiochet, Joris Guérin

TL;DR

This work tackles threshold selection for neural network runtime monitors operating under unknown threats. It compares four threshold-optimization-set strategies (ID, ID+T, ID+O, ID+T+O) across three image datasets and multiple monitor types, evaluating performance with fixed thresholds on threat-inclusive evaluation sets. The study finds that using knowledge of the anticipated target threat (ID+T) yields the strongest thresholding performance, while incorporating generic threats (ID+O, ID+T+O) can degrade robustness to unforeseen threats; the choice of effectiveness measure also significantly shapes outcomes. These results challenge the reliance on threshold-agnostic metrics and offer practical guidance for deploying robust monitors, highlighting the tradeoffs between safety and availability and suggesting future work on narrower threat categories and broader task generalization.

Abstract

With the increasing use of neural networks in critical systems, runtime monitoring becomes essential to reject unsafe predictions during inference. Various techniques have emerged to establish rejection scores that maximize the separability between the distributions of safe and unsafe predictions. The efficacy of these approaches is mostly evaluated using threshold-agnostic metrics, such as the area under the receiver operating characteristic curve. However, in real-world applications, an effective monitor also requires identifying a good threshold to transform these scores into meaningful binary decisions. Despite the pivotal importance of threshold optimization, this problem has received little attention. A few studies touch upon this question, but they typically assume that the runtime data distribution mirrors the training distribution, which is a strong assumption as monitors are supposed to safeguard a system against potentially unforeseen threats. In this work, we present rigorous experiments on various image datasets to investigate: 1. The effectiveness of monitors in handling unforeseen threats, which are not available during threshold adjustments. 2. Whether integrating generic threats into the threshold optimization scheme can enhance the robustness of monitors.

Can we Defend Against the Unknown? An Empirical Study About Threshold Selection for Neural Network Monitoring

TL;DR

This work tackles threshold selection for neural network runtime monitors operating under unknown threats. It compares four threshold-optimization-set strategies (ID, ID+T, ID+O, ID+T+O) across three image datasets and multiple monitor types, evaluating performance with fixed thresholds on threat-inclusive evaluation sets. The study finds that using knowledge of the anticipated target threat (ID+T) yields the strongest thresholding performance, while incorporating generic threats (ID+O, ID+T+O) can degrade robustness to unforeseen threats; the choice of effectiveness measure also significantly shapes outcomes. These results challenge the reliance on threshold-agnostic metrics and offer practical guidance for deploying robust monitors, highlighting the tradeoffs between safety and availability and suggesting future work on narrower threat categories and broader task generalization.

Abstract

With the increasing use of neural networks in critical systems, runtime monitoring becomes essential to reject unsafe predictions during inference. Various techniques have emerged to establish rejection scores that maximize the separability between the distributions of safe and unsafe predictions. The efficacy of these approaches is mostly evaluated using threshold-agnostic metrics, such as the area under the receiver operating characteristic curve. However, in real-world applications, an effective monitor also requires identifying a good threshold to transform these scores into meaningful binary decisions. Despite the pivotal importance of threshold optimization, this problem has received little attention. A few studies touch upon this question, but they typically assume that the runtime data distribution mirrors the training distribution, which is a strong assumption as monitors are supposed to safeguard a system against potentially unforeseen threats. In this work, we present rigorous experiments on various image datasets to investigate: 1. The effectiveness of monitors in handling unforeseen threats, which are not available during threshold adjustments. 2. Whether integrating generic threats into the threshold optimization scheme can enhance the robustness of monitors.
Paper Structure (23 sections, 5 figures, 6 tables)

This paper contains 23 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Conceptual Overview -- This research compares four ways to construct threshold optimization sets for neural network runtime monitors, each representing distinct assumptions about the data available for threshold tuning.
  • Figure 2: Optimization sets comparison -- Critical distance diagram (Nemenyi test). The horizontal axis represents the average rank of the strategies. A black bar connecting two or more strategies indicates no significant difference.
  • Figure 3: Visual example to explain our findings -- Distributions of monitoring scores for the Optimization and Evaluation sets. Selected example: ID data: CIFAR10, threat: FGSM, NN: Resnet, monitor: Mahalanobis. Vertical lines represent thresholds obtained with different effectiveness measures. In (d), the dashed (resp. plain) lines represent thresholds obtained with OS+F1 (resp. g-mean). The "Optimal" thresholds maximize the effectiveness measures on the Evaluation set.
  • Figure 4: Threshold Optimization sets comparison, with OS+F1 as the effectiveness measure -- Critical distance diagram showing the results of the Nemenyi test. The horizontal axis represents the average rank of the approaches. A black bar connecting two or more approaches indicates no significant difference.
  • Figure 5: Threshold Optimization sets comparison, with g-mean as the effectiveness measure -- Critical distance diagram showing the results of the Nemenyi test. The horizontal axis represents the average rank of the approaches. A black bar connecting two or more approaches indicates no significant difference.