Relative Information Gain and Gaussian Process Regression

Hamish Flynn

Relative Information Gain and Gaussian Process Regression

Hamish Flynn

TL;DR

The paper introduces the relative information gain $\gamma_n(\eta,\beta)$ to bridge the effective dimension and information gain in kernel-based regression, showing it interpolates between these two complexity measures and shares the same growth as the effective dimension. It derives a localised PAC-Bayesian excess risk bound for Gaussian process regression whose complexity term naturally yields the relative information gain, and establishes upper bounds on $\gamma_n(\eta,\beta)$ via spectral decay of Mercer kernels. By combining these bounds with the excess risk bound, the authors obtain minimax-optimal rates of convergence, explicitly characterized by polynomial and exponential eigenvalue decays. The results connect spectral properties of the kernel to learning rates and provide a principled framework for risk guarantees in GP regression with fixed design. This work advances understanding of complexity measures in RKHS settings and offers practical, theory-backed rates that depend on kernel spectra.

Abstract

The sample complexity of estimating or maximising an unknown function in a reproducing kernel Hilbert space is known to be linked to both the effective dimension and the information gain associated with the kernel. While the information gain has an attractive information-theoretic interpretation, the effective dimension typically results in better rates. We introduce a new quantity called the relative information gain, which measures the sensitivity of the information gain with respect to the observation noise. We show that the relative information gain smoothly interpolates between the effective dimension and the information gain, and that the relative information gain has the same growth rate as the effective dimension. In the second half of the paper, we prove a new PAC-Bayesian excess risk bound for Gaussian process regression. The relative information gain arises naturally from the complexity term in this PAC-Bayesian bound. We prove bounds on the relative information gain that depend on the spectral properties of the kernel. When these upper bounds are combined with our excess risk bound, we obtain minimax-optimal rates of convergence.

Relative Information Gain and Gaussian Process Regression

TL;DR

The paper introduces the relative information gain

to bridge the effective dimension and information gain in kernel-based regression, showing it interpolates between these two complexity measures and shares the same growth as the effective dimension. It derives a localised PAC-Bayesian excess risk bound for Gaussian process regression whose complexity term naturally yields the relative information gain, and establishes upper bounds on

via spectral decay of Mercer kernels. By combining these bounds with the excess risk bound, the authors obtain minimax-optimal rates of convergence, explicitly characterized by polynomial and exponential eigenvalue decays. The results connect spectral properties of the kernel to learning rates and provide a principled framework for risk guarantees in GP regression with fixed design. This work advances understanding of complexity measures in RKHS settings and offers practical, theory-backed rates that depend on kernel spectra.

Relative Information Gain and Gaussian Process Regression

TL;DR

Abstract

Relative Information Gain and Gaussian Process Regression

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (26)