Exploring Energy Landscapes for Minimal Counterfactual Explanations: Applications in Cybersecurity and Beyond

Spyridon Evangelatos; Eleni Veroni; Vasilis Efthymiou; Christos Nikolopoulos; Georgios Th. Papadopoulos; Panagiotis Sarigiannidis

Exploring Energy Landscapes for Minimal Counterfactual Explanations: Applications in Cybersecurity and Beyond

Spyridon Evangelatos, Eleni Veroni, Vasilis Efthymiou, Christos Nikolopoulos, Georgios Th. Papadopoulos, Panagiotis Sarigiannidis

TL;DR

This work addresses the generation of minimal, actionable counterfactual explanations for high-dimensional ML decisions in cybersecurity by introducing an energy-based framework that fuses perturbation theory with statistical mechanics. It leverages a local Taylor expansion to approximate the model, defines a composite energy E and a free energy Fβ that couples energy with entropy, and then searches for perturbations via Boltzmann sampling and simulated annealing to navigate complex landscapes. The approach yields robust, diverse counterfactuals that respect domain constraints and reveal feature sensitivity near decision boundaries, with IoT cybersecurity experiments illustrating improved interpretability over baseline methods. The proposed methodology offers a principled, scalable pathway to robust explanations with potential extensions to fairness-aware explanations and generative AI contexts, enabling more trustworthy AI-driven security decisions.

Abstract

Counterfactual explanations have emerged as a prominent method in Explainable Artificial Intelligence (XAI), providing intuitive and actionable insights into Machine Learning model decisions. In contrast to other traditional feature attribution methods that assess the importance of input variables, counterfactual explanations focus on identifying the minimal changes required to alter a model's prediction, offering a ``what-if'' analysis that is close to human reasoning. In the context of XAI, counterfactuals enhance transparency, trustworthiness and fairness, offering explanations that are not just interpretable but directly applicable in the decision-making processes. In this paper, we present a novel framework that integrates perturbation theory and statistical mechanics to generate minimal counterfactual explanations in explainable AI. We employ a local Taylor expansion of a Machine Learning model's predictive function and reformulate the counterfactual search as an energy minimization problem over a complex landscape. In sequence, we model the probability of candidate perturbations leveraging the Boltzmann distribution and use simulated annealing for iterative refinement. Our approach systematically identifies the smallest modifications required to change a model's prediction while maintaining plausibility. Experimental results on benchmark datasets for cybersecurity in Internet of Things environments, demonstrate that our method provides actionable, interpretable counterfactuals and offers deeper insights into model sensitivity and decision boundaries in high-dimensional spaces.

Exploring Energy Landscapes for Minimal Counterfactual Explanations: Applications in Cybersecurity and Beyond

TL;DR

Abstract

Exploring Energy Landscapes for Minimal Counterfactual Explanations: Applications in Cybersecurity and Beyond

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)