Table of Contents
Fetching ...

Deep Q-Exponential Processes

Zhi Chang, Chukwudi Obite, Shuang Zhou, Shiwei Lan

TL;DR

This paper generalizes Q-EP to deep Q-EP to enjoy both proper regularization and improved expressiveness and demonstrates the numerical advantages of the proposed deep Q-EP model by comparing with multiple state-of-the-art deep probabilistic models.

Abstract

Motivated by deep neural networks, the deep Gaussian process (DGP) generalizes the standard GP by stacking multiple layers of GPs. Despite the enhanced expressiveness, GP, as an $L_2$ regularization prior, tends to be over-smooth and sub-optimal for inhomogeneous subjects, such as images with edges. Recently, Q-exponential process (Q-EP) has been proposed as an $L_q$ relaxation to GP and demonstrated with more desirable regularization properties through a parameter $q>0$ with $q=2$ corresponding to GP. Sharing the similar tractability of posterior and predictive distributions with GP, Q-EP can also be stacked to improve its modeling flexibility. In this paper, we generalize Q-EP to deep Q-EP to enjoy both proper regularization and improved expressiveness. The generalization is realized by introducing shallow Q-EP as a latent variable model and then building a hierarchy of the shallow Q-EP layers. Sparse approximation by inducing points and scalable variational strategy are applied to facilitate the inference. We demonstrate the numerical advantages of the proposed deep Q-EP model by comparing with multiple state-of-the-art deep probabilistic models.

Deep Q-Exponential Processes

TL;DR

This paper generalizes Q-EP to deep Q-EP to enjoy both proper regularization and improved expressiveness and demonstrates the numerical advantages of the proposed deep Q-EP model by comparing with multiple state-of-the-art deep probabilistic models.

Abstract

Motivated by deep neural networks, the deep Gaussian process (DGP) generalizes the standard GP by stacking multiple layers of GPs. Despite the enhanced expressiveness, GP, as an regularization prior, tends to be over-smooth and sub-optimal for inhomogeneous subjects, such as images with edges. Recently, Q-exponential process (Q-EP) has been proposed as an relaxation to GP and demonstrated with more desirable regularization properties through a parameter with corresponding to GP. Sharing the similar tractability of posterior and predictive distributions with GP, Q-EP can also be stacked to improve its modeling flexibility. In this paper, we generalize Q-EP to deep Q-EP to enjoy both proper regularization and improved expressiveness. The generalization is realized by introducing shallow Q-EP as a latent variable model and then building a hierarchy of the shallow Q-EP layers. Sparse approximation by inducing points and scalable variational strategy are applied to facilitate the inference. We demonstrate the numerical advantages of the proposed deep Q-EP model by comparing with multiple state-of-the-art deep probabilistic models.

Paper Structure

This paper contains 22 sections, 3 theorems, 59 equations, 5 figures, 2 tables.

Key Result

Proposition 2.1

If ${\bf u}\sim \mathrm{q}\!-\!\mathrm{ED}_N(\boldsymbol{\mu}, {\bf C})$, then we have

Figures (5)

  • Figure 1: 2d latent space of multi-phase oil-flow dataset: contrasting GP-LVM (top row) with two shallow Q-EPs for $q=1.25$ (middle row) and $q=1$ (bottom row).
  • Figure 2: Comparing deep Q-EP \ref{['fig:ts_DQEP']} with cutting-edge deep models including deep GP \ref{['fig:ts_DGP']}, DKL-GP \ref{['fig:ts_DKLGP']} and DSPP \ref{['fig:ts_DSPP']} on modeling a 2d-output time series. Mean absolute errors (MAE) on testing data are 0.0494 (shallow GP), 0.0578 (shallow Q-EP), 0.0442 (deep GP), 0.0444 (deep Q-EP), 0.0536 (DKL-GP), 0.0896 (DSPP) respectively.
  • Figure 3: Comparing shallow (1-layer), deep (2-layer) and deeper (3-layer) Q-EPs with GP, deep GP, DKL-GP and DSPP on a classification problem defined on annular rhombus. Circles, upper and lower triangles label three classes in the training data. Classification accuracy on testing data are $81.04\%$ (GP), $82.2\%$ (Deep GP), $76.4\%$ (DKL-GP), $78.88\%$ (DSPP), $83.4\%$ (Q-EP), $85.64\%$ (Deep Q-EP) and $87.2\%$ (Deeper Q-EP) respectively.
  • Figure 4: Comparing DKL-QEP and DKL-GP with CNN on two benchmark classification problems.
  • Figure C.1: Comparing Q-EP \ref{['fig:cls_QEP_logits']} and deep Q-EP \ref{['fig:cls_DQEP_logits']} with GP \ref{['fig:cls_GP_logits']}, deep GP \ref{['fig:cls_DGP_logits']}, DKL-GP \ref{['fig:cls_DKLGP_logits']} and DSPP \ref{['fig:cls_DSPP_logits']} on a classification problem defined on annular rhombus.

Theorems & Definitions (11)

  • Definition 1
  • Proposition 2.1
  • Definition 2
  • Remark 1
  • Definition 3
  • Theorem 2.1
  • Theorem 3.1
  • proof
  • Remark 2
  • Remark 3
  • ...and 1 more