Improving Neural Additive Models with Bayesian Principles

Kouroche Bouchiat; Alexander Immer; Hugo Yèche; Gunnar Rätsch; Vincent Fortuin

Improving Neural Additive Models with Bayesian Principles

Kouroche Bouchiat, Alexander Immer, Hugo Yèche, Gunnar Rätsch, Vincent Fortuin

TL;DR

Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks are developed.

Abstract

Neural additive models (NAMs) enhance the transparency of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we augment them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) facilitating the ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.

Improving Neural Additive Models with Bayesian Principles

TL;DR

Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks are developed.

Abstract

Paper Structure (54 sections, 19 equations, 14 figures, 7 tables)

This paper contains 54 sections, 19 equations, 14 figures, 7 tables.

Introduction
Main contributions.
Related Work
Neural additive models.
Bayesian neural networks.
Additive Gaussian processes.
Laplace-Approximated NAM
Bayesian Neural Additive Model
Linearized Laplace Approximation
Feature Network Selection
Feature Network Predictive
Feature Network Interaction
Experiments
Illustrative Example
UCI Regression and Classification
...and 39 more sections

Figures (14)

Figure 1: Regression on a synthetic dataset with known additive structure (see \ref{['sec:exptoyexample']} and \ref{['apd:toyexample']} for details). The la-nam fits the data well, provides useful uncertainty estimates, and, along with OAK-GP, correctly ignores the uninformative feature ($f_4$, bottom.)
Figure 2: Risk of mortality and associated epistemic uncertainty ($\pm$ 2 std. deviations) on the MIMIC-III mortality prediction task. The la-nam relies on smoother feature curves and provides useful uncertainties while ignoring the uninformative feature.
Figure 3: Local explanations for the risk of a sample patient in the HiRID mortality task in the nam. Features which contribute less than 0.1 to the log-odds magnitude are omitted. The NAM selects a large number of features and does not provide uncertainty estimates.
Figure 4: Local explanations from the la-nam for the same patient as in \ref{['fig:localnam']}. Features whose credible intervals overlap with zero are omitted. The model selects far fewer features and provides uncertainty estimates, further aiding interpretation.
Figure 5: Feature interactions uncovered in the MIMIC-III dataset by the la-nam. (left) Last-layer posterior correlation matrix, used to select the most informative feature-interaction pairs. (right) Two selected example feature interactions and their associated uncertainty.
...and 9 more figures

Improving Neural Additive Models with Bayesian Principles

TL;DR

Abstract

Improving Neural Additive Models with Bayesian Principles

Authors

TL;DR

Abstract

Table of Contents

Figures (14)