Table of Contents
Fetching ...

Exploring Energy Landscapes for Minimal Counterfactual Explanations: Applications in Cybersecurity and Beyond

Spyridon Evangelatos, Eleni Veroni, Vasilis Efthymiou, Christos Nikolopoulos, Georgios Th. Papadopoulos, Panagiotis Sarigiannidis

TL;DR

This work addresses the generation of minimal, actionable counterfactual explanations for high-dimensional ML decisions in cybersecurity by introducing an energy-based framework that fuses perturbation theory with statistical mechanics. It leverages a local Taylor expansion to approximate the model, defines a composite energy E and a free energy Fβ that couples energy with entropy, and then searches for perturbations via Boltzmann sampling and simulated annealing to navigate complex landscapes. The approach yields robust, diverse counterfactuals that respect domain constraints and reveal feature sensitivity near decision boundaries, with IoT cybersecurity experiments illustrating improved interpretability over baseline methods. The proposed methodology offers a principled, scalable pathway to robust explanations with potential extensions to fairness-aware explanations and generative AI contexts, enabling more trustworthy AI-driven security decisions.

Abstract

Counterfactual explanations have emerged as a prominent method in Explainable Artificial Intelligence (XAI), providing intuitive and actionable insights into Machine Learning model decisions. In contrast to other traditional feature attribution methods that assess the importance of input variables, counterfactual explanations focus on identifying the minimal changes required to alter a model's prediction, offering a ``what-if'' analysis that is close to human reasoning. In the context of XAI, counterfactuals enhance transparency, trustworthiness and fairness, offering explanations that are not just interpretable but directly applicable in the decision-making processes. In this paper, we present a novel framework that integrates perturbation theory and statistical mechanics to generate minimal counterfactual explanations in explainable AI. We employ a local Taylor expansion of a Machine Learning model's predictive function and reformulate the counterfactual search as an energy minimization problem over a complex landscape. In sequence, we model the probability of candidate perturbations leveraging the Boltzmann distribution and use simulated annealing for iterative refinement. Our approach systematically identifies the smallest modifications required to change a model's prediction while maintaining plausibility. Experimental results on benchmark datasets for cybersecurity in Internet of Things environments, demonstrate that our method provides actionable, interpretable counterfactuals and offers deeper insights into model sensitivity and decision boundaries in high-dimensional spaces.

Exploring Energy Landscapes for Minimal Counterfactual Explanations: Applications in Cybersecurity and Beyond

TL;DR

This work addresses the generation of minimal, actionable counterfactual explanations for high-dimensional ML decisions in cybersecurity by introducing an energy-based framework that fuses perturbation theory with statistical mechanics. It leverages a local Taylor expansion to approximate the model, defines a composite energy E and a free energy Fβ that couples energy with entropy, and then searches for perturbations via Boltzmann sampling and simulated annealing to navigate complex landscapes. The approach yields robust, diverse counterfactuals that respect domain constraints and reveal feature sensitivity near decision boundaries, with IoT cybersecurity experiments illustrating improved interpretability over baseline methods. The proposed methodology offers a principled, scalable pathway to robust explanations with potential extensions to fairness-aware explanations and generative AI contexts, enabling more trustworthy AI-driven security decisions.

Abstract

Counterfactual explanations have emerged as a prominent method in Explainable Artificial Intelligence (XAI), providing intuitive and actionable insights into Machine Learning model decisions. In contrast to other traditional feature attribution methods that assess the importance of input variables, counterfactual explanations focus on identifying the minimal changes required to alter a model's prediction, offering a ``what-if'' analysis that is close to human reasoning. In the context of XAI, counterfactuals enhance transparency, trustworthiness and fairness, offering explanations that are not just interpretable but directly applicable in the decision-making processes. In this paper, we present a novel framework that integrates perturbation theory and statistical mechanics to generate minimal counterfactual explanations in explainable AI. We employ a local Taylor expansion of a Machine Learning model's predictive function and reformulate the counterfactual search as an energy minimization problem over a complex landscape. In sequence, we model the probability of candidate perturbations leveraging the Boltzmann distribution and use simulated annealing for iterative refinement. Our approach systematically identifies the smallest modifications required to change a model's prediction while maintaining plausibility. Experimental results on benchmark datasets for cybersecurity in Internet of Things environments, demonstrate that our method provides actionable, interpretable counterfactuals and offers deeper insights into model sensitivity and decision boundaries in high-dimensional spaces.

Paper Structure

This paper contains 8 sections, 31 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: A 3D surface plot of the free energy function $\mathcal{F}_{\beta}(\Delta\bm{x})$ over the perturbation space. The parameters used are $\Delta \bm{x} \in [-10,2]^2$, $\lambda=1.0$, $\mu=1.0$, $w_{i}\sim\mathcal{U}(1,10)$, $\bm{x}_0=[3,3]$, $c=0$, $\alpha=0.01$, $\beta=0.01$, $n=500$ iterations, $\epsilon = 10^{-4}$.
  • Figure 2: Gradient norm convergence for different $\beta$ values. The parameters used were $\Delta \bm{x} \in [-10,2]^2$, $\lambda=1.0$, $\mu=1.0$, $w_{i}\sim\mathcal{U}(1,10)$, $\bm{x}_0=[3,3]$, $c=0$, $\alpha=10^{-3}$, $n=6000$ iterations, $\epsilon = 10^{-4}$.
  • Figure 3: Contour plots illustrating the sensitivity of counterfactual stability to variations in the regularization strength, $\lambda$, and the decision term weight $\mu$ for different values of the inverse temperature $\beta$. Higher stability regions (red) indicate optimal parameter pairs that yield more robust counterfactuals, while lower stability regions (blue) highlight configurations that lead to higher sensitivity to perturbations.
  • Figure 4: Comparison of three counterfactual methods for cybersecurity in IoT environments. Each method's variability is visualized using shaded regions, representing the standard deviation across multiple runs. Larger uncertainty regions indicate greater sensitivity in counterfactual selection.
  • Figure 5: Visualization of the counterfactual decision boundary shift for the three methods used for counterfactual explanations in IoT security applications. The contours in each subplot represent the classifier's decision function, where red regions indicate predictions of malicious traffic and blue regions correspond to benign traffic.
  • ...and 1 more figures