Table of Contents
Fetching ...

Gaussian Process Neural Additive Models

Wei Zhang, Brian Barr, John Paisley

TL;DR

This paper proposes a new subclass of NAMs that utilize a single-layer neural network construction of the Gaussian process via random Fourier features, which is called Gaussian Process Neural Additive Models (GP-NAM), and demonstrates the performance of GP-NAM on several tabular datasets.

Abstract

Deep neural networks have revolutionized many fields, but their black-box nature also occasionally prevents their wider adoption in fields such as healthcare and finance, where interpretable and explainable models are required. The recent development of Neural Additive Models (NAMs) is a significant step in the direction of interpretable deep learning for tabular datasets. In this paper, we propose a new subclass of NAMs that use a single-layer neural network construction of the Gaussian process via random Fourier features, which we call Gaussian Process Neural Additive Models (GP-NAM). GP-NAMs have the advantage of a convex objective function and number of trainable parameters that grows linearly with feature dimensionality. It suffers no loss in performance compared to deeper NAM approaches because GPs are well-suited for learning complex non-parametric univariate functions. We demonstrate the performance of GP-NAM on several tabular datasets, showing that it achieves comparable or better performance in both classification and regression tasks with a large reduction in the number of parameters.

Gaussian Process Neural Additive Models

TL;DR

This paper proposes a new subclass of NAMs that utilize a single-layer neural network construction of the Gaussian process via random Fourier features, which is called Gaussian Process Neural Additive Models (GP-NAM), and demonstrates the performance of GP-NAM on several tabular datasets.

Abstract

Deep neural networks have revolutionized many fields, but their black-box nature also occasionally prevents their wider adoption in fields such as healthcare and finance, where interpretable and explainable models are required. The recent development of Neural Additive Models (NAMs) is a significant step in the direction of interpretable deep learning for tabular datasets. In this paper, we propose a new subclass of NAMs that use a single-layer neural network construction of the Gaussian process via random Fourier features, which we call Gaussian Process Neural Additive Models (GP-NAM). GP-NAMs have the advantage of a convex objective function and number of trainable parameters that grows linearly with feature dimensionality. It suffers no loss in performance compared to deeper NAM approaches because GPs are well-suited for learning complex non-parametric univariate functions. We demonstrate the performance of GP-NAM on several tabular datasets, showing that it achieves comparable or better performance in both classification and regression tasks with a large reduction in the number of parameters.
Paper Structure (18 sections, 12 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 12 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: A graphical representation of the neural additive model. $x_i$ is $i$th feature of an input vector having $D$ dimensions. $f_0$ is the bias term. $y$ is the response or label. The function $f_i(x_i)$ is the shape function for feature $i$. The sum $f_0 + \sum_i f_i(x_i)$ is used to predict $y$.
  • Figure 2: The architecture of GP-NAM. Each $x_i$ represents one feature of a single input vector $x\in\mathbb{R}^d$. Each $(z_s,c_s)$ is shared across shape functions. The GP $f_{\theta_i}(x_i)$ is the shape function for the $i$th feature. The only trainable parameters are the feature-specific $S$-dimensional weight vectors $w_1,\dots,w_D$ that connect the output from the cosine functions to their corresponding GP shape function. The prediction is made by using the sum of the outputs from all the shape functions with the bias term $f_0$. This is mathematically equivalent to an additive Gaussian process.
  • Figure 3: Parameter number ratios $|$NAM$|/|$NBM$|$ (orange) and $|$NAM$|/|$GP-NAM$|$ (blue) as a function of data dimensionality. We set $S=100$ basis functions for all models to give a fair comparison. GP-NAM uses $\sim$60x fewer parameters than NAM regardless of the dimensionality of $x$, and e.g. $\sim$15x fewer than NBM at $x\in\mathbb{R}^{40}$. We are interested in the tabular data regime where e.g. $D<500$.
  • Figure 4: Shape functions of NAM, NODE-GAM and GP-NAM on the LCD data set in the original scales. The density of each feature in the training data is plotted in pink. For reference, logistic regression learned weights indicated by the slope of the light dashed line on each plot. Inspection shows that GP-NAM is in fairly close agreement with linear classification, while still allowing for meaningful nonlinearities to be learned from the data (DTI in particular).

Theorems & Definitions (2)

  • Definition 1: Gaussian process
  • Definition 2: RFF Approximation