Table of Contents
Fetching ...

ORACLE: Explaining Feature Interactions in Neural Networks with ANOVA

Dongseok Kim, Hyoungsun Choi, Mohamed Jismy Aashik Rasool, Gisung Oh

TL;DR

ORACLE reframes neural network explanations as an ANOVA-style surrogate learned on a discretized input grid to produce orthogonal main- and pairwise-interaction maps with $L^2$-consistency to the backbone. It defines target interaction strengths $S^{\star}_{jk}$ and provides a principled projection-based surrogate whose maps and strengths converge to the oracle as grid resolution and data increase. Empirically, ORACLE yields more faithful and stable interaction rankings and localization than SHAP-based baselines on synthetic and real tabular benchmarks, with demonstrated DoE-style interpretability and cross-backbone transfer; latent-domain results suggest limits in highly entangled representations. The work establishes a principled bridge between classical ANOVA/DoE and modern neural explanations, and outlines extensions to higher-order interactions, adaptive grids, and hybrid methods for broader model classes and applications.

Abstract

We introduce ORACLE, a framework that explains neural networks on tabular and scientific design data. It fits ANOVA-style main and pairwise interaction effects to a model's prediction surface. ORACLE treats a trained network as a black-box response, learns an orthogonal factorial surrogate on a discretized input grid, and uses simple centering and $μ$-rebalancing steps to obtain main- and interaction-effect tables that remain $L^2$-consistent with the original model. The resulting grid-based interaction maps are easy to visualize, comparable across backbones, and directly connected to classical design-of-experiments analyses. On synthetic factorial and low- to medium-dimensional tabular regression benchmarks, ORACLE more accurately recovers ground-truth ANOVA interactions and hotspot structure than Monte Carlo SHAP-family interaction methods, as measured by ranking, localization, and cross-backbone stability metrics. In latent image and text settings, ORACLE instead delineates its natural scope, and our results indicate that grid-based ANOVA surrogates are most effective when features admit interpretable factorial structure, making ORACLE particularly well-suited to scientific and engineering tabular workflows that require stable, DoE-style interaction summaries.

ORACLE: Explaining Feature Interactions in Neural Networks with ANOVA

TL;DR

ORACLE reframes neural network explanations as an ANOVA-style surrogate learned on a discretized input grid to produce orthogonal main- and pairwise-interaction maps with -consistency to the backbone. It defines target interaction strengths and provides a principled projection-based surrogate whose maps and strengths converge to the oracle as grid resolution and data increase. Empirically, ORACLE yields more faithful and stable interaction rankings and localization than SHAP-based baselines on synthetic and real tabular benchmarks, with demonstrated DoE-style interpretability and cross-backbone transfer; latent-domain results suggest limits in highly entangled representations. The work establishes a principled bridge between classical ANOVA/DoE and modern neural explanations, and outlines extensions to higher-order interactions, adaptive grids, and hybrid methods for broader model classes and applications.

Abstract

We introduce ORACLE, a framework that explains neural networks on tabular and scientific design data. It fits ANOVA-style main and pairwise interaction effects to a model's prediction surface. ORACLE treats a trained network as a black-box response, learns an orthogonal factorial surrogate on a discretized input grid, and uses simple centering and -rebalancing steps to obtain main- and interaction-effect tables that remain -consistent with the original model. The resulting grid-based interaction maps are easy to visualize, comparable across backbones, and directly connected to classical design-of-experiments analyses. On synthetic factorial and low- to medium-dimensional tabular regression benchmarks, ORACLE more accurately recovers ground-truth ANOVA interactions and hotspot structure than Monte Carlo SHAP-family interaction methods, as measured by ranking, localization, and cross-backbone stability metrics. In latent image and text settings, ORACLE instead delineates its natural scope, and our results indicate that grid-based ANOVA surrogates are most effective when features admit interpretable factorial structure, making ORACLE particularly well-suited to scientific and engineering tabular workflows that require stable, DoE-style interaction summaries.

Paper Structure

This paper contains 68 sections, 15 theorems, 131 equations, 3 figures, 13 tables, 1 algorithm.

Key Result

Proposition 4.4

The minimizer $\beta^\star$ in Equation eq:oracle-pop-projection exists and is unique under standard conditions, and the corresponding function $f^L$ admits a discrete ANOVA decomposition with $\mu^L\in\mathcal{H}^L_0$, $m^L_j\in\mathcal{H}^L_j$, and $g^L_{jk}\in\mathcal{H}^L_{jk}$. Moreover, $f^L$ is the $L^2(P_Z)$-orthogonal projection of $f(X)$ onto $\mathcal{H}^L_0 \oplus (\bigoplus_j \mathca

Figures (3)

  • Figure 1: Grouped bar plots of mean interaction metrics across datasets. Each panel shows one metric (NDCG@K, Peak-IoU@q, Xfer-NDCG@K, CCC, IG@K,B with $K\!=\!5$, $q\!=\!0.10$, $B\!=\!3$); within each dataset, bars compare ORACLE to SHAP-family interaction methods. Higher values indicate better interaction detection, localization, transfer, scale agreement, or intervention utility.
  • Figure 2: Classical ORACLE main-effect plots for the Airfoil dataset (Backbone A). Each panel shows the marginal response $\mu + m_j(x_j)$ as a function of the bin centers for feature $j$, with all other features integrated out under the empirical input distribution.
  • Figure 3: Classical ORACLE interaction plots for the Airfoil dataset (Backbone A). Rows and columns correspond to factors; each upper-triangular panel shows the predicted SPL across bins of the column factor for three representative bins (Bin 0, Bin 2, Bin 4) of the row factor. Non-parallel or crossing lines indicate strong non-additive interactions, while nearly parallel lines correspond to approximately additive behaviour.

Theorems & Definitions (41)

  • Definition 4.1: Oracle ANOVA decomposition
  • Definition 4.2: Oracle ANOVA interaction map and strength
  • Remark 4.3
  • Proposition 4.4: Discrete ANOVA projection
  • Remark 4.5
  • Proposition 4.6: Finite-sample convergence
  • Theorem 4.7: Consistency of discrete interaction strengths
  • Theorem 4.8: Top-$K$ selection consistency
  • Corollary 4.9: NDCG@$K$ consistency
  • Proposition 4.10: Homomorphism to classical $2^5$ factorial ANOVA
  • ...and 31 more