Table of Contents
Fetching ...

Robust Adversarial Quantification via Conflict-Aware Evidential Deep Learning

Charmaine Barker, Daniel Bethell, Simos Gerasimou

TL;DR

C-EDL significantly outperforms state-of-the-art EDL variants and competitive baselines, achieving substantial reductions in coverage for OOD data and adversarial data, across a range of datasets, attack types, and uncertainty metrics.

Abstract

Reliability of deep learning models is critical for deployment in high-stakes applications, where out-of-distribution or adversarial inputs may lead to detrimental outcomes. Evidential Deep Learning, an efficient paradigm for uncertainty quantification, models predictions as Dirichlet distributions of a single forward pass. However, EDL is particularly vulnerable to adversarially perturbed inputs, making overconfident errors. Conflict-aware Evidential Deep Learning (C-EDL) is a lightweight post-hoc uncertainty quantification approach that mitigates these issues, enhancing adversarial and OOD robustness without retraining. C-EDL generates diverse, task-preserving transformations per input and quantifies representational disagreement to calibrate uncertainty estimates when needed. C-EDL's conflict-aware prediction adjustment improves detection of OOD and adversarial inputs, maintaining high in-distribution accuracy and low computational overhead. Our experimental evaluation shows that C-EDL significantly outperforms state-of-the-art EDL variants and competitive baselines, achieving substantial reductions in coverage for OOD data (up to $\approx$55%) and adversarial data (up to $\approx$90%), across a range of datasets, attack types, and uncertainty metrics.

Robust Adversarial Quantification via Conflict-Aware Evidential Deep Learning

TL;DR

C-EDL significantly outperforms state-of-the-art EDL variants and competitive baselines, achieving substantial reductions in coverage for OOD data and adversarial data, across a range of datasets, attack types, and uncertainty metrics.

Abstract

Reliability of deep learning models is critical for deployment in high-stakes applications, where out-of-distribution or adversarial inputs may lead to detrimental outcomes. Evidential Deep Learning, an efficient paradigm for uncertainty quantification, models predictions as Dirichlet distributions of a single forward pass. However, EDL is particularly vulnerable to adversarially perturbed inputs, making overconfident errors. Conflict-aware Evidential Deep Learning (C-EDL) is a lightweight post-hoc uncertainty quantification approach that mitigates these issues, enhancing adversarial and OOD robustness without retraining. C-EDL generates diverse, task-preserving transformations per input and quantifies representational disagreement to calibrate uncertainty estimates when needed. C-EDL's conflict-aware prediction adjustment improves detection of OOD and adversarial inputs, maintaining high in-distribution accuracy and low computational overhead. Our experimental evaluation shows that C-EDL significantly outperforms state-of-the-art EDL variants and competitive baselines, achieving substantial reductions in coverage for OOD data (up to 55%) and adversarial data (up to 90%), across a range of datasets, attack types, and uncertainty metrics.

Paper Structure

This paper contains 25 sections, 1 theorem, 26 equations, 17 figures, 13 tables.

Key Result

Theorem 1

The conflict measure $C$ is bounded between $(0,1]$, tends towards $0$ if and only if all transformations produce identical Dirichlet parameters concentrated on a single class, and monotonically non-decreasing with increasing intra and inter-class conflict with $\lambda \in [0, \frac{1}{2}]$.

Figures (17)

  • Figure 1: Uncertainty on ID (MNIST) vs. OOD (FashionMNIST). Both methods stay low on ID, but C-EDL assigns higher values to OOD where EDL remains low.
  • Figure 2: Evidence on a FashionMNIST OOD input under increasing L2PGD perturbations. Solid bars: EDL, gridded bars: C-EDL, dotted line: ID–OOD threshold. C-EDL stays low under attack, while EDL misclassifies as ID.
  • Figure 3: Overview of Conflict-aware Evidential Deep Learning (C-EDL) approach, with its key post-hoc steps that advance regular EDL highlighted in blue. For each new input, C-EDL performs $T$ metamorphic transformations, yielding a label-preserving evidence set, and then executes conflict adjustment on the accumulated evidence to calibrate the final prediction. When applied to in-distribution inputs, C-EDL closely matches the original EDL output, while given out-of-distribution inputs, C-EDL amplifies uncertainty to better reflect model disagreement.
  • Figure 4: Visualised adversarial AUROC plots including the binary decision threshold for OOD/Adv rejection, for comparative methods where the ID dataset is MNIST and the OOD dataset is FashionMNIST. Full plots are in Appendix \ref{['sec:appendix-Extended Analysis of Core Results']}.
  • Figure 5: Adversarial coverage (circle size) compared to mean difference between the computed abstention metric ($\Delta$, circle colour; Table \ref{['tab:delta-results']}) for all approaches on all evaluated datasets. Positive values indicate predictions tend to be above the threshold, while negative values indicate predictions fall below the threshold (as expected for adversarially attacked data). Smaller circle and more negative $\Delta$ values (darker green) are ideal, reflecting stronger confidence that the input is adversarial.
  • ...and 12 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof