Table of Contents
Fetching ...

Globally Interpretable Classifiers via Boolean Formulas with Dynamic Propositions

Reijo Jaakkola, Tomi Janhunen, Antti Kuusisto, Masood Feyzbakhsh Rankooh, Miikka Vilander

TL;DR

This work addresses the challenge of globally interpretable binary classification on tabular data by extracting short Boolean formulas using a declarative ASP(DL) framework that discretizes numeric attributes on the fly. It introduces a dynamic formula-length method with pivot and interval discretization variants, implemented in a three-layer ASP-based architecture, and compares against state-of-the-art baselines such as XGBoost and random forests on seven UCI datasets. The results demonstrate that these dynamic, compact formulas can achieve competitive accuracy while remaining immediately human-readable, illustrating a favorable trade-off between interpretability and predictive performance. Overall, the approach advances explainable AI by delivering globally interpretable classifiers that rival black-box models in accuracy and offer ready-to-understand explanations for each decision.

Abstract

Interpretability and explainability are among the most important challenges of modern artificial intelligence, being mentioned even in various legislative sources. In this article, we develop a method for extracting immediately human interpretable classifiers from tabular data. The classifiers are given in the form of short Boolean formulas built with propositions that can either be directly extracted from categorical attributes or dynamically computed from numeric ones. Our method is implemented using Answer Set Programming. We investigate seven datasets and compare our results to ones obtainable by state-of-the-art classifiers for tabular data, namely, XGBoost and random forests. Over all datasets, the accuracies obtainable by our method are similar to the reference methods. The advantage of our classifiers in all cases is that they are very short and immediately human intelligible as opposed to the black-box nature of the reference methods.

Globally Interpretable Classifiers via Boolean Formulas with Dynamic Propositions

TL;DR

This work addresses the challenge of globally interpretable binary classification on tabular data by extracting short Boolean formulas using a declarative ASP(DL) framework that discretizes numeric attributes on the fly. It introduces a dynamic formula-length method with pivot and interval discretization variants, implemented in a three-layer ASP-based architecture, and compares against state-of-the-art baselines such as XGBoost and random forests on seven UCI datasets. The results demonstrate that these dynamic, compact formulas can achieve competitive accuracy while remaining immediately human-readable, illustrating a favorable trade-off between interpretability and predictive performance. Overall, the approach advances explainable AI by delivering globally interpretable classifiers that rival black-box models in accuracy and offer ready-to-understand explanations for each decision.

Abstract

Interpretability and explainability are among the most important challenges of modern artificial intelligence, being mentioned even in various legislative sources. In this article, we develop a method for extracting immediately human interpretable classifiers from tabular data. The classifiers are given in the form of short Boolean formulas built with propositions that can either be directly extracted from categorical attributes or dynamically computed from numeric ones. Our method is implemented using Answer Set Programming. We investigate seven datasets and compare our results to ones obtainable by state-of-the-art classifiers for tabular data, namely, XGBoost and random forests. Over all datasets, the accuracies obtainable by our method are similar to the reference methods. The advantage of our classifiers in all cases is that they are very short and immediately human intelligible as opposed to the black-box nature of the reference methods.
Paper Structure (16 sections, 6 equations, 2 figures, 4 tables)

This paper contains 16 sections, 6 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Three Layers of the Implementation
  • Figure 2: The average test accuracies and standard deviations obtained for each dataset with the pivoted FSM, random forests, XGBoost and the median FSM. Where available, we include also accuracies reported as Baseline Model Performance in the UCI repository, although these are not directly comparable due to the unknown technical particularities behind the UCI numbers.

Theorems & Definitions (1)

  • Example 1