Table of Contents
Fetching ...

Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity

Guangzhi Xiong, Sanchit Sinha, Aidong Zhang

TL;DR

Neural Additive Experts (NAEs) introduce a per-feature mixture-of-experts architecture with context-gated routing to interpolate between strictly additive GAMs and interacting models. NAEs maintain feature-level interpretability through additive aggregation while enabling flexible, context-dependent feature effects via multiple experts and a softmax gating mechanism, regulated by an expert-variation penalty. Theoretical results show GAM containment, GA2M approximation, and monotone control of additivity as the penalty grows, while experiments demonstrate competitive accuracy with transparent explanations on synthetic and real-world datasets. The framework provides a principled, tunable trade-off between predictive performance and interpretability, with scalable per-feature explanations and explicit bounds on feature contributions.

Abstract

The trade-off between interpretability and accuracy remains a core challenge in machine learning. Standard Generalized Additive Models (GAMs) offer clear feature attributions but are often constrained by their strictly additive nature, which can limit predictive performance. Introducing feature interactions can boost accuracy yet may obscure individual feature contributions. To address these issues, we propose Neural Additive Experts (NAEs), a novel framework that seamlessly balances interpretability and accuracy. NAEs employ a mixture of experts framework, learning multiple specialized networks per feature, while a dynamic gating mechanism integrates information across features, thereby relaxing rigid additive constraints. Furthermore, we propose targeted regularization techniques to mitigate variance among expert predictions, facilitating a smooth transition from an exclusively additive model to one that captures intricate feature interactions while maintaining clarity in feature attributions. Our theoretical analysis and experiments on synthetic data illustrate the model's flexibility, and extensive evaluations on real-world datasets confirm that NAEs achieve an optimal balance between predictive accuracy and transparent, feature-level explanations. The code is available at https://github.com/Teddy-XiongGZ/NAE.

Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity

TL;DR

Neural Additive Experts (NAEs) introduce a per-feature mixture-of-experts architecture with context-gated routing to interpolate between strictly additive GAMs and interacting models. NAEs maintain feature-level interpretability through additive aggregation while enabling flexible, context-dependent feature effects via multiple experts and a softmax gating mechanism, regulated by an expert-variation penalty. Theoretical results show GAM containment, GA2M approximation, and monotone control of additivity as the penalty grows, while experiments demonstrate competitive accuracy with transparent explanations on synthetic and real-world datasets. The framework provides a principled, tunable trade-off between predictive performance and interpretability, with scalable per-feature explanations and explicit bounds on feature contributions.

Abstract

The trade-off between interpretability and accuracy remains a core challenge in machine learning. Standard Generalized Additive Models (GAMs) offer clear feature attributions but are often constrained by their strictly additive nature, which can limit predictive performance. Introducing feature interactions can boost accuracy yet may obscure individual feature contributions. To address these issues, we propose Neural Additive Experts (NAEs), a novel framework that seamlessly balances interpretability and accuracy. NAEs employ a mixture of experts framework, learning multiple specialized networks per feature, while a dynamic gating mechanism integrates information across features, thereby relaxing rigid additive constraints. Furthermore, we propose targeted regularization techniques to mitigate variance among expert predictions, facilitating a smooth transition from an exclusively additive model to one that captures intricate feature interactions while maintaining clarity in feature attributions. Our theoretical analysis and experiments on synthetic data illustrate the model's flexibility, and extensive evaluations on real-world datasets confirm that NAEs achieve an optimal balance between predictive accuracy and transparent, feature-level explanations. The code is available at https://github.com/Teddy-XiongGZ/NAE.
Paper Structure (44 sections, 10 theorems, 39 equations, 14 figures, 12 tables)

This paper contains 44 sections, 10 theorems, 39 equations, 14 figures, 12 tables.

Key Result

Theorem 1

For any $f\in\textsc{GAM}$ there exists $K=1$ and parameters of an NAE such that $\hat{y}(x)=f(x)$ for all $x$.

Figures (14)

  • Figure 1: Illustration of the Neural Additive Expert (NAE) framework. A gating network dynamically assigns relevance scores to multiple expert predictors for each feature, and the aggregated feature contributions are summed to produce the final prediction. This design maintains interpretability while enabling flexible modeling of complex relationships.
  • Figure 2: Shape functions learned by NAM and NAE on simulated data. For the multimodal case, NAE captures the oscillatory structure, while NAM fails.
  • Figure 3: Effect of the expert variation penalty $\lambda$ on the learned shape function of $x_1$ in NAEs. As $\lambda$ increases, the model transitions from flexible to strictly additive.
  • Figure 4: Comparison of Longitude effect on house price predictions across models. Y-axis shows mean-centered feature contributions. Background bars indicate normalized data density. Blue dots show actual feature effects. Longitude-Latitude interaction learned by NAE is also shown.
  • Figure 5: Shape plots of the Longitude feature learned by NAEs with different $\lambda$ values. Higher $\lambda$ enforces additivity and narrows the range of possible feature effects.
  • ...and 9 more figures

Theorems & Definitions (19)

  • Theorem 1: GAM containment
  • proof
  • Lemma 1: Finite separable approximation of pairwise terms
  • Lemma 2: Two-expert product construction
  • proof : Proof sketch
  • Theorem 2: GA$^2$M containment up to arbitrary precision
  • proof : Proof sketch
  • Theorem 3: Monotone additivity and additive limit
  • proof : Proof sketch
  • Lemma 3: Restatement of Lemma \ref{['lem:sep']}
  • ...and 9 more