Support Collapse of Deep Gaussian Processes with Polynomial Kernels for a Wide Regime of Hyperparameters
Daryna Chernobrovkina, Steffen Grünewälder
TL;DR
This work analyzes the priors induced by deep Gaussian processes with polynomial kernels, revealing depth‑dependent averaging that makes priors highly sensitive to layer hyperparameters. Using a Berry‑Esseen framework, it derives a uniform approximation of a DGP by a simple form $S e^Y (g_1(x))^{c_1}$ and establishes explicit log‑moment based bounds, yielding a threshold near $σ\approx 1.88$ that governs whether the depth drives the prior mass toward zero or toward large norms. The results extend from linear to polynomial kernels, providing concrete BE bounds for both identical and non‑identically distributed layer outputs and illustrating with quadratic‑kernel examples how the interaction of factors determines the prior’s behavior. The findings help reconcile observed pathologies with practical performance of DGPs and suggest principled directions for hyperparameter tuning and kernel design, while outlining open questions for extending the theory to broader kernel classes. Overall, the paper offers a quantitative lens on why DGP priors can collapse or explode and how this depends on depth and kernel structure, connecting to related convolutional DGP insights in the literature.
Abstract
We analyze the prior that a Deep Gaussian Process with polynomial kernels induces. We observe that, even for relatively small depths, averaging effects occur within such a Deep Gaussian Process and that the prior can be analyzed and approximated effectively by means of the Berry-Esseen Theorem. One of the key findings of this analysis is that, in the absence of careful hyper-parameter tuning, the prior of a Deep Gaussian Process either collapses rapidly towards zero as the depth increases or places negligible mass on low norm functions. This aligns well with experimental findings and mirrors known results for convolution based Deep Gaussian Processes.
