Table of Contents
Fetching ...

Improving Neural Additive Models with Bayesian Principles

Kouroche Bouchiat, Alexander Immer, Hugo Yèche, Gunnar Rätsch, Vincent Fortuin

TL;DR

Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks are developed.

Abstract

Neural additive models (NAMs) enhance the transparency of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we augment them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) facilitating the ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.

Improving Neural Additive Models with Bayesian Principles

TL;DR

Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks are developed.

Abstract

Neural additive models (NAMs) enhance the transparency of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we augment them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) facilitating the ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.
Paper Structure (54 sections, 19 equations, 14 figures, 7 tables)

This paper contains 54 sections, 19 equations, 14 figures, 7 tables.

Figures (14)

  • Figure 1: Regression on a synthetic dataset with known additive structure (see \ref{['sec:exptoyexample']} and \ref{['apd:toyexample']} for details). The la-nam fits the data well, provides useful uncertainty estimates, and, along with OAK-GP, correctly ignores the uninformative feature ($f_4$, bottom.)
  • Figure 2: Risk of mortality and associated epistemic uncertainty ($\pm$ 2 std. deviations) on the MIMIC-III mortality prediction task. The la-nam relies on smoother feature curves and provides useful uncertainties while ignoring the uninformative feature.
  • Figure 3: Local explanations for the risk of a sample patient in the HiRID mortality task in the nam. Features which contribute less than 0.1 to the log-odds magnitude are omitted. The NAM selects a large number of features and does not provide uncertainty estimates.
  • Figure 4: Local explanations from the la-nam for the same patient as in \ref{['fig:localnam']}. Features whose credible intervals overlap with zero are omitted. The model selects far fewer features and provides uncertainty estimates, further aiding interpretation.
  • Figure 5: Feature interactions uncovered in the MIMIC-III dataset by the la-nam. (left) Last-layer posterior correlation matrix, used to select the most informative feature-interaction pairs. (right) Two selected example feature interactions and their associated uncertainty.
  • ...and 9 more figures