Table of Contents
Fetching ...

Space Explanations of Neural Network Classification

Faezeh Labbaf, Tomáš Kolárik, Martin Blicha, Grigory Fedyukovich, Michael Wand, Natasha Sharygina

TL;DR

This work introduces Space Explanations, a logic-based framework for explaining neural network classifications over continuous input spaces using Craig interpolation and unsatisfiable-core techniques. It defines Space Explanations and Impact Space to express sufficient, relationship-aware conditions for class decisions, and presents an algorithmic framework with Generalize, Reduce, and Capture strategies to generate flexible, scalable explanations. The authors implement a prototype, SpEXplAIn, and demonstrate its effectiveness on real benchmarks (heart_attack, obesity, MNIST), showing more expressive explanations and favorable scaling compared to state-of-the-art approaches. Overall, the method yields provably correct, boundary-approximating explanations that capture feature interactions, with practical potential for explainable AI in sensitive domains.

Abstract

We present a novel logic-based concept called Space Explanations for classifying neural networks that gives provable guarantees of the behavior of the network in continuous areas of the input feature space. To automatically generate space explanations, we leverage a range of flexible Craig interpolation algorithms and unsatisfiable core generation. Based on real-life case studies, ranging from small to medium to large size, we demonstrate that the generated explanations are more meaningful than those computed by state-of-the-art.

Space Explanations of Neural Network Classification

TL;DR

This work introduces Space Explanations, a logic-based framework for explaining neural network classifications over continuous input spaces using Craig interpolation and unsatisfiable-core techniques. It defines Space Explanations and Impact Space to express sufficient, relationship-aware conditions for class decisions, and presents an algorithmic framework with Generalize, Reduce, and Capture strategies to generate flexible, scalable explanations. The authors implement a prototype, SpEXplAIn, and demonstrate its effectiveness on real benchmarks (heart_attack, obesity, MNIST), showing more expressive explanations and favorable scaling compared to state-of-the-art approaches. Overall, the method yields provably correct, boundary-approximating explanations that capture feature interactions, with practical potential for explainable AI in sensitive domains.

Abstract

We present a novel logic-based concept called Space Explanations for classifying neural networks that gives provable guarantees of the behavior of the network in continuous areas of the input feature space. To automatically generate space explanations, we leverage a range of flexible Craig interpolation algorithms and unsatisfiable core generation. Based on real-life case studies, ranging from small to medium to large size, we demonstrate that the generated explanations are more meaningful than those computed by state-of-the-art.

Paper Structure

This paper contains 9 sections, 3 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Example of close approximations of non-trivial decision boundaries using meaningful explanations of the classification: a NN classifier of heart attack risk heart_disease_45.
  • Figure 2: Algorithmic framework: starting from an initial space explanation $\varphi_0$, the impact space is expanded and a logically weaker explanation $\varphi$ is produced. The user may select from multiple strategies for further optimization (cf. the dashed arrow).
  • Figure 3: Projections into selected pairs of features of interpolation-based explanations (G) of the heart-attack risk model, possibly combined with reduction ($\textbf{R}{}_\mathit{min}$).
  • Figure 4: Comparing projections of interpolation-based (Generalize, G) and interval explanations ($\textbf{I}{} \circ \textbf{A}{}$) into selected pairs of features of the heart-attack risk model.
  • Figure 5: Approximation of decision boundaries and feature relations using strategies C and $\textbf{R}{}_\mathit{min}{} \circ \textbf{C}{}$ within selected pairs of features of the heart-attack risk model.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Definition 1: Space Explanation, Impact Space
  • Example 1
  • Example 2