Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity

Guangzhi Xiong; Sanchit Sinha; Aidong Zhang

Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity

Guangzhi Xiong, Sanchit Sinha, Aidong Zhang

TL;DR

Neural Additive Experts (NAEs) introduce a per-feature mixture-of-experts architecture with context-gated routing to interpolate between strictly additive GAMs and interacting models. NAEs maintain feature-level interpretability through additive aggregation while enabling flexible, context-dependent feature effects via multiple experts and a softmax gating mechanism, regulated by an expert-variation penalty. Theoretical results show GAM containment, GA2M approximation, and monotone control of additivity as the penalty grows, while experiments demonstrate competitive accuracy with transparent explanations on synthetic and real-world datasets. The framework provides a principled, tunable trade-off between predictive performance and interpretability, with scalable per-feature explanations and explicit bounds on feature contributions.

Abstract

The trade-off between interpretability and accuracy remains a core challenge in machine learning. Standard Generalized Additive Models (GAMs) offer clear feature attributions but are often constrained by their strictly additive nature, which can limit predictive performance. Introducing feature interactions can boost accuracy yet may obscure individual feature contributions. To address these issues, we propose Neural Additive Experts (NAEs), a novel framework that seamlessly balances interpretability and accuracy. NAEs employ a mixture of experts framework, learning multiple specialized networks per feature, while a dynamic gating mechanism integrates information across features, thereby relaxing rigid additive constraints. Furthermore, we propose targeted regularization techniques to mitigate variance among expert predictions, facilitating a smooth transition from an exclusively additive model to one that captures intricate feature interactions while maintaining clarity in feature attributions. Our theoretical analysis and experiments on synthetic data illustrate the model's flexibility, and extensive evaluations on real-world datasets confirm that NAEs achieve an optimal balance between predictive accuracy and transparent, feature-level explanations. The code is available at https://github.com/Teddy-XiongGZ/NAE.

Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity

TL;DR

Abstract

Paper Structure (44 sections, 10 theorems, 39 equations, 14 figures, 12 tables)

This paper contains 44 sections, 10 theorems, 39 equations, 14 figures, 12 tables.

INTRODUCTION
NEURAL ADDITIVE EXPERTS
Feature Representation and Expert Networks
Dynamic Gating and Expert Aggregation
Training Objective and Regularization
Interpretability and Feature Attribution
THEORETICAL ANALYSIS OF NAES
Model classes.
Exact containment of GAMs
From GA$^2$M to NAE via separable approximation
Controlling additivity via the expert-variation penalty
EXPERIMENTS ON SIMULATED DATA
Simulation Setup and Visualization
Effect of Variation Penalty $\lambda$
EXPERIMENTS ON REAL-WORLD DATA
...and 29 more sections

Key Result

Theorem 1

For any $f\in\textsc{GAM}$ there exists $K=1$ and parameters of an NAE such that $\hat{y}(x)=f(x)$ for all $x$.

Figures (14)

Figure 1: Illustration of the Neural Additive Expert (NAE) framework. A gating network dynamically assigns relevance scores to multiple expert predictors for each feature, and the aggregated feature contributions are summed to produce the final prediction. This design maintains interpretability while enabling flexible modeling of complex relationships.
Figure 2: Shape functions learned by NAM and NAE on simulated data. For the multimodal case, NAE captures the oscillatory structure, while NAM fails.
Figure 3: Effect of the expert variation penalty $\lambda$ on the learned shape function of $x_1$ in NAEs. As $\lambda$ increases, the model transitions from flexible to strictly additive.
Figure 4: Comparison of Longitude effect on house price predictions across models. Y-axis shows mean-centered feature contributions. Background bars indicate normalized data density. Blue dots show actual feature effects. Longitude-Latitude interaction learned by NAE is also shown.
Figure 5: Shape plots of the Longitude feature learned by NAEs with different $\lambda$ values. Higher $\lambda$ enforces additivity and narrows the range of possible feature effects.
...and 9 more figures

Theorems & Definitions (19)

Theorem 1: GAM containment
proof
Lemma 1: Finite separable approximation of pairwise terms
Lemma 2: Two-expert product construction
proof : Proof sketch
Theorem 2: GA$^2$M containment up to arbitrary precision
proof : Proof sketch
Theorem 3: Monotone additivity and additive limit
proof : Proof sketch
Lemma 3: Restatement of Lemma \ref{['lem:sep']}
...and 9 more

Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity

TL;DR

Abstract

Neural Additive Experts: Context-Gated Experts for Controllable Model Additivity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (19)