Table of Contents
Fetching ...

MiniFool -- Physics-Constraint-Aware Minimizer-Based Adversarial Attacks in Deep Neural Networks

Lucie Flek, Oliver Janik, Philipp Alexander Jung, Akbar Karimi, Timo Saala, Alexander Schmidt, Matthias Schott, Philipp Soldin, Matthias Thiesmeyer, Christopher Wiebusch, Ulrich Willemsen

TL;DR

MiniFool proposes physics-constraint-aware adversarial attacks that perturb inputs under experimental uncertainties to flip neural network classifications in physics analyses. It optimizes a combined cost λ = α η + β (f_{i^*}(x^{a}; θ) − g)^2 with η = (1/N) Σ_i ((x_i^0 − x_i^{a})/σ_i)^2 and σ_i scaled by an attack parameter s, enabling per-event robustness assessment. The method is demonstrated on MNIST, IceCube's ν_τ search, and CMS Open Data b-jet tagging, showing that robustness can be quantified by s-scans and that misclassified events are generally more susceptible to perturbations than correctly classified ones. The work provides an open-source implementation and suggests physics-informed adversarial testing as a tool to verify and strengthen neural-network analyses in physics contexts, with future directions including richer uncertainty models and applications beyond classification.

Abstract

In this paper, we present a new algorithm, MiniFool, that implements physics-inspired adversarial attacks for testing neural network-based classification tasks in particle and astroparticle physics. While we initially developed the algorithm for the search for astrophysical tau neutrinos with the IceCube Neutrino Observatory, we apply it to further data from other science domains, thus demonstrating its general applicability. Here, we apply the algorithm to the well-known MNIST data set and furthermore, to Open Data data from the CMS experiment at the Large Hadron Collider. The algorithm is based on minimizing a cost function that combines a $χ^2$ based test-statistic with the deviation from the desired target score. The test statistic quantifies the probability of the perturbations applied to the data based on the experimental uncertainties. For our studied use cases, we find that the likelihood of a flipped classification differs for both the initially correctly and incorrectly classified events. When testing changes of the classifications as a function of an attack parameter that scales the experimental uncertainties, the robustness of the network decision can be quantified. Furthermore, this allows testing the robustness of the classification of unlabeled experimental data.

MiniFool -- Physics-Constraint-Aware Minimizer-Based Adversarial Attacks in Deep Neural Networks

TL;DR

MiniFool proposes physics-constraint-aware adversarial attacks that perturb inputs under experimental uncertainties to flip neural network classifications in physics analyses. It optimizes a combined cost λ = α η + β (f_{i^*}(x^{a}; θ) − g)^2 with η = (1/N) Σ_i ((x_i^0 − x_i^{a})/σ_i)^2 and σ_i scaled by an attack parameter s, enabling per-event robustness assessment. The method is demonstrated on MNIST, IceCube's ν_τ search, and CMS Open Data b-jet tagging, showing that robustness can be quantified by s-scans and that misclassified events are generally more susceptible to perturbations than correctly classified ones. The work provides an open-source implementation and suggests physics-informed adversarial testing as a tool to verify and strengthen neural-network analyses in physics contexts, with future directions including richer uncertainty models and applications beyond classification.

Abstract

In this paper, we present a new algorithm, MiniFool, that implements physics-inspired adversarial attacks for testing neural network-based classification tasks in particle and astroparticle physics. While we initially developed the algorithm for the search for astrophysical tau neutrinos with the IceCube Neutrino Observatory, we apply it to further data from other science domains, thus demonstrating its general applicability. Here, we apply the algorithm to the well-known MNIST data set and furthermore, to Open Data data from the CMS experiment at the Large Hadron Collider. The algorithm is based on minimizing a cost function that combines a based test-statistic with the deviation from the desired target score. The test statistic quantifies the probability of the perturbations applied to the data based on the experimental uncertainties. For our studied use cases, we find that the likelihood of a flipped classification differs for both the initially correctly and incorrectly classified events. When testing changes of the classifications as a function of an attack parameter that scales the experimental uncertainties, the robustness of the network decision can be quantified. Furthermore, this allows testing the robustness of the classification of unlabeled experimental data.

Paper Structure

This paper contains 7 sections, 9 equations, 8 figures.

Figures (8)

  • Figure 1: Simple network for the classification of MNIST images.
  • Figure 2: Example of MiniFool Attacks on MNIST images. The upper row shows a correctly and the bottom row a wrongly classified "9". The bottom row is wrongly classified as an "8". Left are the original images, the middle shows the images perturbed by MiniFool with an attack parameter $s=0.2$. The right figures show the applied optimal perturbations. The respective classification scores can be seen above the images.
  • Figure 3: Average softmax score of the initially predicted class as a function of the attack parameter. The solid lines show the mean over correctly and incorrectly classified samples, while the shaded bands indicate one standard deviation. Misclassified samples exhibit a faster confidence decay under adversarial perturbation.
  • Figure 4: Example image of a recorded $\nu_\tau$ candidate event, recorded in Nov. 2019. The image shows the data recorded by the DOMs on the string closest to the interaction vertex (leading string). The recorded DOM amplitude, normalized to photoelectrons, is represented as $60\times 500$ pixel image corresponding to the sensor number (or depth, respectively) along the vertical axis and the time in steps of 3.3ns on the horizontal axis. Clearly visible is the starting point of the event, and then the distance-dependent arrival of photons at the DOMs of the string. The total number of recorded photons is 6000.0. (modified picture taken from the supplementary material in IceCube:2024nhk)
  • Figure 5: Example of an attack of a simulated IceCube event (modified images from masterjanik). (a) shows the original image of the measured charge versus time on the leading string. The image is attacked with MiniFool and an attack parameter of $s=0.1$, resulting in (b). The difference (a)-(b) is shown in (c). Finally, (d) shows the p-value of applied changes.
  • ...and 3 more figures