Enforcing Fundamental Relations via Adversarial Attacks on Input Parameter Correlations
Timo Saala, Lucie Flek, Alexander Jung, Akbar Karimi, Alexander Schmidt, Matthias Schott, Philipp Soldin, Christopher Wiebusch
TL;DR
The paper presents Random Distribution Shuffle Attack (RDSA), a correlation-focused adversarial method that preserves marginal feature distributions while perturbing inter-feature relationships to mislead classifiers. By resampling selected features according to finely binned one-dimensional distributions, RDSA degrades model performance primarily through altered correlations, not distributional shifts, and provides quantifiable metrics such as FR, JSD, and correlation-difference $ig\langle c_c \big\rangle$. The authors demonstrate RDSA across six diverse tasks—including CERN Open Data physics problems, weather forecasting, MNIST digit recognition, HAR, and MIMIC-IV mortality—showing high fooling ratios with minimal marginal distribution changes, and they show data augmentation via RDSA can improve AUROC, often outperforming LPF and competing with CTGAN/TVAE. The work highlights a practical path to robustness and domain-appropriate data augmentation by emphasizing correlation structure, with limitations tied to data availability and handling of categorical features, and suggests future directions in uncertainty modeling and higher-order statistics.
Abstract
Correlations between input parameters play a crucial role in many scientific classification tasks, since these are often related to fundamental laws of nature. For example, in high energy physics, one of the common deep learning use-cases is the classification of signal and background processes in particle collisions. In many such cases, the fundamental principles of the correlations between observables are often better understood than the actual distributions of the observables themselves. In this work, we present a new adversarial attack algorithm called Random Distribution Shuffle Attack (RDSA), emphasizing the correlations between observables in the network rather than individual feature characteristics. Correct application of the proposed novel attack can result in a significant improvement in classification performance - particularly in the context of data augmentation - when using the generated adversaries within adversarial training. Given that correlations between input features are also crucial in many other disciplines. We demonstrate the RDSA effectiveness on six classification tasks, including two particle collision challenges (using CERN Open Data), hand-written digit recognition (MNIST784), human activity recognition (HAR), weather forecasting (Rain in Australia), and ICU patient mortality (MIMIC-IV), demonstrating a general use case beyond fundamental physics for this new type of adversarial attack algorithms.
