Table of Contents
Fetching ...

Efficient Preimage Approximation for Neural Network Certification

Anton Björklund, Mykola Zaitsev, Marta Kwiatkowska

TL;DR

This work tackles robust certification of neural networks under patch attacks by extending the PREMAP preimage-approximation framework to larger, CNN-based models. It introduces tighter intermediate bounds, adaptive Monte Carlo sampling, and refined branching heuristics to enable scalable preimage analysis, achieving substantial speedups and higher completion rates on patch-certification tasks and reinforcement learning benchmarks. The approach is demonstrated across patch robustness, out-of-distribution detection, and explainability use cases, highlighting the practical value of exact preimage approximations for reliability and safety in real-world AI systems. Overall, the enhancements widen the applicability of preimage-based certification and offer a versatile toolkit for validating robustness and reliability in safety-critical contexts.

Abstract

The growing reliance on artificial intelligence in safety- and security-critical applications demands effective neural network certification. A challenging real-world use case is "patch attacks", where adversarial patches or lighting conditions obscure parts of images, for example, traffic signs. A significant step towards certification against patch attacks was recently achieved using PREMAP, which uses under- and over-approximations of the preimage, the set of inputs that lead to a specified output, for the certification. While the PREMAP approach is versatile, it is currently limited to fully-connected neural networks of moderate dimensionality. In order to tackle broader real-world use cases, we present novel algorithmic extensions to PREMAP involving tighter bounds, adaptive Monte Carlo sampling, and improved branching heuristics. Firstly, we demonstrate that these efficiency improvements significantly outperform the original PREMAP and enable scaling to convolutional neural networks that were previously intractable. Secondly, we showcase the potential of preimage approximation methodology for analysing and certifying reliability and robustness on a range of use cases from computer vision and control.

Efficient Preimage Approximation for Neural Network Certification

TL;DR

This work tackles robust certification of neural networks under patch attacks by extending the PREMAP preimage-approximation framework to larger, CNN-based models. It introduces tighter intermediate bounds, adaptive Monte Carlo sampling, and refined branching heuristics to enable scalable preimage analysis, achieving substantial speedups and higher completion rates on patch-certification tasks and reinforcement learning benchmarks. The approach is demonstrated across patch robustness, out-of-distribution detection, and explainability use cases, highlighting the practical value of exact preimage approximations for reliability and safety in real-world AI systems. Overall, the enhancements widen the applicability of preimage-based certification and offer a versatile toolkit for validating robustness and reliability in safety-critical contexts.

Abstract

The growing reliance on artificial intelligence in safety- and security-critical applications demands effective neural network certification. A challenging real-world use case is "patch attacks", where adversarial patches or lighting conditions obscure parts of images, for example, traffic signs. A significant step towards certification against patch attacks was recently achieved using PREMAP, which uses under- and over-approximations of the preimage, the set of inputs that lead to a specified output, for the certification. While the PREMAP approach is versatile, it is currently limited to fully-connected neural networks of moderate dimensionality. In order to tackle broader real-world use cases, we present novel algorithmic extensions to PREMAP involving tighter bounds, adaptive Monte Carlo sampling, and improved branching heuristics. Firstly, we demonstrate that these efficiency improvements significantly outperform the original PREMAP and enable scaling to convolutional neural networks that were previously intractable. Secondly, we showcase the potential of preimage approximation methodology for analysing and certifying reliability and robustness on a range of use cases from computer vision and control.

Paper Structure

This paper contains 27 sections, 11 equations, 12 figures, 7 tables, 2 algorithms.

Figures (12)

  • Figure 1: Examples of physical patch attacks. Left: real graffiti eykholt2018robustphysicalworldattacksdeep. Middle: sunbeam cao2024secure. Right: our abstraction of a patch attack.
  • Figure 2: Linear bounds for inactive, active and unstable ReLU neurons.
  • Figure 3: Visualizing three neuron selection heuristics: area (yellow area), under (green distance), and extra (the average blue length at the samples).
  • Figure 4: The effect of using various heuristics for the selection of which neuron to split (each dot represents a different combination). The black line is a single variable, linear regression, where a positive slope is correlated with faster improvements, see \ref{['eq:cov_delta']}.
  • Figure 5: Time to calculate a preimage under-approximation with a 10-minute time limit and a $0.9$ threshold. The patches are applied to random images (from every class) with random positions and sizes. We group the results based on whether the centre of the patch is within 10 pixels of the centre.
  • ...and 7 more figures