Table of Contents
Fetching ...

Wasserstein-type Gaussian Process Regressions for Input Measurement Uncertainty

Hengrui Luo, Xiaoye S. Li, Yang Liu, Marcus Noack, Ji Qiang, Mark D. Risser

Abstract

Gaussian process (GP) regression is widely used for uncertainty quantification, yet the standard formulation assumes noise-free covariates. When inputs are measured with error, this errors-in-variables (EIV) setting can lead to optimistically narrow posterior intervals and biased decisions. We study GP regression under input measurement uncertainty by representing each noisy input as a probability measure and defining covariance through Wasserstein distances between these measures. Building on this perspective, we instantiate a deterministic projected Wasserstein ARD (PWA) kernel whose one-dimensional components admit closed-form expressions and whose product structure yields a scalable, positive-definite kernel on distributions. Unlike latent-input GP models, PWA-based GPs (\PWAGPs) handle input noise without introducing unobserved covariates or Monte Carlo projections, making uncertainty quantification more transparent and robust.

Wasserstein-type Gaussian Process Regressions for Input Measurement Uncertainty

Abstract

Gaussian process (GP) regression is widely used for uncertainty quantification, yet the standard formulation assumes noise-free covariates. When inputs are measured with error, this errors-in-variables (EIV) setting can lead to optimistically narrow posterior intervals and biased decisions. We study GP regression under input measurement uncertainty by representing each noisy input as a probability measure and defining covariance through Wasserstein distances between these measures. Building on this perspective, we instantiate a deterministic projected Wasserstein ARD (PWA) kernel whose one-dimensional components admit closed-form expressions and whose product structure yields a scalable, positive-definite kernel on distributions. Unlike latent-input GP models, PWA-based GPs (\PWAGPs) handle input noise without introducing unobserved covariates or Monte Carlo projections, making uncertainty quantification more transparent and robust.
Paper Structure (29 sections, 8 theorems, 80 equations, 2 figures, 4 tables)

This paper contains 29 sections, 8 theorems, 80 equations, 2 figures, 4 tables.

Key Result

Proposition 1

Consider the errors-in-variables model with $\varepsilon_X \perp \varepsilon$. Let $f(x)=c+w^\top x$ be affine with $w\neq 0$. Define the naive $(1-\alpha)$ interval that ignores input noise: Then for any fixed $X$, whenever $w^\top\Sigma_X w>0$.

Figures (2)

  • Figure 1: Illustration of error-in-variable regression problem. In both panels, the true function is $y=f(X)=\frac{\sin(10\pi\cdot X)}{2X}+(X-1)^{4}$, but in each case the “ true” input locations $X$ are contaminated with measurement errors (with standard deviation of 0.01 and 0.05 on the left and right, respectively). Using a GP to accurately infer the true function (the blue line) must account for the fact that the input locations $U$ are uncertain.
  • Figure 2: RMSE (left) and CRPS (right) for GP-RBF vs. WGP-RBF across training/testing years 1987-2022. GP-RBF degrades when trained on recent and tested on older years, indicating poor generalization under temporal shifts. WGP-RBF remains robust, consistently achieving lower errors and better probabilistic calibration.

Theorems & Definitions (15)

  • Proposition 1
  • Definition 2
  • Definition 3
  • Proposition 4
  • Theorem 5
  • Corollary 6
  • Corollary 7
  • Proposition 8
  • Proposition 9
  • proof
  • ...and 5 more