From Deep Additive Kernel Learning to Last-Layer Bayesian Neural Networks via Induced Prior Approximation
Wenyuan Zhao, Haoyuan Chen, Tie Liu, Rui Tuo, Chao Tian
TL;DR
This paper tackles the scalability and uncertainty estimation challenges of Deep Kernel Learning (DKL) by introducing Deep Additive Kernel (DAK), which reinterprets the last-layer GP as a sparse Bayesian neural network via induced prior approximation on 1-D grids. By embedding hierarchical NN features into an additive GP framework and applying a sparse, grid-based prior, DAK achieves linear-time inference with a closed-form ELBO and predictive distribution for regression, while maintaining DKL-style interpretability. Empirically, DAK outperforms state-of-the-art DKL methods on regression and image classification tasks and scales better to high-dimensional feature spaces. The work bridges GPs and NNs through a last-layer Bayesian lens and opens avenues for more general kernels and variational families in scalable kernel learning.
Abstract
With the strengths of both deep learning and kernel methods like Gaussian Processes (GPs), Deep Kernel Learning (DKL) has gained considerable attention in recent years. From the computational perspective, however, DKL becomes challenging when the input dimension of the GP layer is high. To address this challenge, we propose the Deep Additive Kernel (DAK) model, which incorporates i) an additive structure for the last-layer GP; and ii) induced prior approximation for each GP unit. This naturally leads to a last-layer Bayesian neural network (BNN) architecture. The proposed method enjoys the interpretability of DKL as well as the computational advantages of BNN. Empirical results show that the proposed approach outperforms state-of-the-art DKL methods in both regression and classification tasks.
