Relative Information Gain and Gaussian Process Regression
Hamish Flynn
TL;DR
The paper introduces the relative information gain $\gamma_n(\eta,\beta)$ to bridge the effective dimension and information gain in kernel-based regression, showing it interpolates between these two complexity measures and shares the same growth as the effective dimension. It derives a localised PAC-Bayesian excess risk bound for Gaussian process regression whose complexity term naturally yields the relative information gain, and establishes upper bounds on $\gamma_n(\eta,\beta)$ via spectral decay of Mercer kernels. By combining these bounds with the excess risk bound, the authors obtain minimax-optimal rates of convergence, explicitly characterized by polynomial and exponential eigenvalue decays. The results connect spectral properties of the kernel to learning rates and provide a principled framework for risk guarantees in GP regression with fixed design. This work advances understanding of complexity measures in RKHS settings and offers practical, theory-backed rates that depend on kernel spectra.
Abstract
The sample complexity of estimating or maximising an unknown function in a reproducing kernel Hilbert space is known to be linked to both the effective dimension and the information gain associated with the kernel. While the information gain has an attractive information-theoretic interpretation, the effective dimension typically results in better rates. We introduce a new quantity called the relative information gain, which measures the sensitivity of the information gain with respect to the observation noise. We show that the relative information gain smoothly interpolates between the effective dimension and the information gain, and that the relative information gain has the same growth rate as the effective dimension. In the second half of the paper, we prove a new PAC-Bayesian excess risk bound for Gaussian process regression. The relative information gain arises naturally from the complexity term in this PAC-Bayesian bound. We prove bounds on the relative information gain that depend on the spectral properties of the kernel. When these upper bounds are combined with our excess risk bound, we obtain minimax-optimal rates of convergence.
