Table of Contents
Fetching ...

From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm

Alexander Nadel, Ron Wettenstein

TL;DR

Woodelf introduces a fast, unified SHAP framework that combines decision trees, game theory, and Boolean logic to compute Background and Path-Dependent SHAP, as well as Shapley and Banzhaf interaction values, in linear time. By representing models as Weighted Disjunctive Normal Forms (WDNF) and leveraging decision patterns, Woodelf achieves CPU/GPU efficiency with pure Python vectorization, avoiding custom C++/CUDA code. The approach yields large-scale speedups on real-world datasets while preserving correctness, validated against SHAP and exact baselines. Overall, Woodelf provides a versatile, scalable interpretability toolkit capable of handling multiple attribution metrics across ensemble models. It paves the way for broader adoption of linear-time, GPU-friendly explainability in practice.

Abstract

SHapley Additive exPlanations (SHAP) is a key tool for interpreting decision tree ensembles by assigning contribution values to features. It is widely used in finance, advertising, medicine, and other domains. Two main approaches to SHAP calculation exist: Path-Dependent SHAP, which leverages the tree structure for efficiency, and Background SHAP, which uses a background dataset to estimate feature distributions. We introduce WOODELF, a SHAP algorithm that integrates decision trees, game theory, and Boolean logic into a unified framework. For each consumer, WOODELF constructs a pseudo-Boolean formula that captures their feature values, the structure of the decision tree ensemble, and the entire background dataset. It then leverages this representation to compute Background SHAP in linear time. WOODELF can also compute Path-Dependent SHAP, Shapley interaction values, Banzhaf values, and Banzhaf interaction values. WOODELF is designed to run efficiently on CPU and GPU hardware alike. Available via the WOODELF Python package, it is implemented using NumPy, SciPy, and CuPy without relying on custom C++ or CUDA code. This design enables fast performance and seamless integration into existing frameworks, supporting large-scale computation of SHAP and other game-theoretic values in practice. For example, on a dataset with 3,000,000 rows, 5,000,000 background samples, and 127 features, WOODELF computed all Background Shapley values in 162 seconds on CPU and 16 seconds on GPU - compared to 44 minutes required by the best method on any hardware platform, representing 16x and 165x speedups, respectively.

From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm

TL;DR

Woodelf introduces a fast, unified SHAP framework that combines decision trees, game theory, and Boolean logic to compute Background and Path-Dependent SHAP, as well as Shapley and Banzhaf interaction values, in linear time. By representing models as Weighted Disjunctive Normal Forms (WDNF) and leveraging decision patterns, Woodelf achieves CPU/GPU efficiency with pure Python vectorization, avoiding custom C++/CUDA code. The approach yields large-scale speedups on real-world datasets while preserving correctness, validated against SHAP and exact baselines. Overall, Woodelf provides a versatile, scalable interpretability toolkit capable of handling multiple attribution metrics across ensemble models. It paves the way for broader adoption of linear-time, GPU-friendly explainability in practice.

Abstract

SHapley Additive exPlanations (SHAP) is a key tool for interpreting decision tree ensembles by assigning contribution values to features. It is widely used in finance, advertising, medicine, and other domains. Two main approaches to SHAP calculation exist: Path-Dependent SHAP, which leverages the tree structure for efficiency, and Background SHAP, which uses a background dataset to estimate feature distributions. We introduce WOODELF, a SHAP algorithm that integrates decision trees, game theory, and Boolean logic into a unified framework. For each consumer, WOODELF constructs a pseudo-Boolean formula that captures their feature values, the structure of the decision tree ensemble, and the entire background dataset. It then leverages this representation to compute Background SHAP in linear time. WOODELF can also compute Path-Dependent SHAP, Shapley interaction values, Banzhaf values, and Banzhaf interaction values. WOODELF is designed to run efficiently on CPU and GPU hardware alike. Available via the WOODELF Python package, it is implemented using NumPy, SciPy, and CuPy without relying on custom C++ or CUDA code. This design enables fast performance and seamless integration into existing frameworks, supporting large-scale computation of SHAP and other game-theoretic values in practice. For example, on a dataset with 3,000,000 rows, 5,000,000 background samples, and 127 features, WOODELF computed all Background Shapley values in 162 seconds on CPU and 16 seconds on GPU - compared to 44 minutes required by the best method on any hardware platform, representing 16x and 165x speedups, respectively.

Paper Structure

This paper contains 37 sections, 12 theorems, 33 equations, 3 figures, 10 tables, 3 algorithms.

Key Result

Theorem 1

For any player $i$, the simplified Banzhaf formula $\beta^{\textit{simplified}}_i$ is equal to the original formula $\beta^{\textit{original}}_i$ when the input is in WDNF.

Figures (3)

  • Figure 1: An illustration how both PB functions and decision trees relate to well-established concepts in game theory. In decision trees, the model represents a game, features serve as players, and the prediction corresponds to profit. Under the baseline characteristic function definition, a missing player (e.g., player 2) is set to its baseline value ($b_2$) before making a prediction. In PB functions, the function itself represents a game, and the variables serve as players. Each variable is True when it participates and False when it is missing.
  • Figure 2: An illustration of the WDNF construction process on a small example. The consumer and baseline values are shown alongside a root-to-leaf path. The table explains how a weighted cube is iteratively constructed from these inputs. To compute the Shapley value contribution of the shown leaf, apply Formula \ref{['shap_simplified_formula']} to the constructed weighted cube: $4(age \land \neg sugar)$. For Banzhaf values, use Formula \ref{['banzhaf_simplified_formula']}; for Banzhaf interaction values, use Formula \ref{['banzhaf_simplified_formula_interactions']}; and for Shapley interaction values, use Table \ref{['table_shap_iv']}.
  • Figure 3: A simple decision tree illustrating the excepted prediction computation, given $f_1 = 2$ and $f_2$ is missing.

Theorems & Definitions (34)

  • Definition 1: Characteristic Function
  • Definition 2: Pseudo-Boolean (PB) Function and Weighted Disjunctive Normal Form (WDNF)
  • Definition 3: PB function's Characteristic Function
  • Definition 4: Shapley Interaction Values
  • Definition 5: Decision Tree
  • Definition 6: Root-to-Leaf Path
  • Definition 7: Decision Pattern
  • Definition 8: Weighted Conjunctive Normal Form (WCNF) silva2021maxsat
  • Theorem 1
  • proof
  • ...and 24 more