Sparse Variational Contaminated Noise Gaussian Process Regression with Applications in Geomagnetic Perturbations Forecasting

Daniel Iong; Matthew McAnear; Yuezhou Qu; Shasha Zou; Gabor Toth; Yang Chen

Sparse Variational Contaminated Noise Gaussian Process Regression with Applications in Geomagnetic Perturbations Forecasting

Daniel Iong, Matthew McAnear, Yuezhou Qu, Shasha Zou, Gabor Toth, Yang Chen

TL;DR

This work addresses robust Gaussian process regression in large-scale data subject to outliers and heteroscedastic noise by introducing a contaminated normal (CN) noise model within the Sparse Variational GP (SVGP) framework. The authors develop a stochastic generalized alternating maximization (SGAM) inference algorithm to jointly estimate CN hyperparameters and GP hyperparameters, achieving scalable inference on big datasets. Through simulations, they show parameter recovery and superior robustness of the CN-SVGPM model compared to Gaussian, Student-t, and Laplace noise models, particularly as outlier prevalence increases. They validate the approach on real-world datasets, including flight delays and ground magnetic perturbations, showing more informative and well-calibrated predictive intervals without sacrificing accuracy relative to neural network baselines. The method offers a practical, scalable solution for reliable uncertainty quantification in domains with frequent outliers, such as space weather forecasting and transportation systems.

Abstract

Gaussian Processes (GP) have become popular machine-learning methods for kernel-based learning on datasets with complicated covariance structures. In this paper, we present a novel extension to the GP framework using a contaminated normal likelihood function to better account for heteroscedastic variance and outlier noise. We propose a scalable inference algorithm based on the Sparse Variational Gaussian Process (SVGP) method for fitting sparse Gaussian process regression models with contaminated normal noise on large datasets. We examine an application to geomagnetic ground perturbations, where the state-of-the-art prediction model is based on neural networks. We show that our approach yields shorter prediction intervals for similar coverage and accuracy when compared to an artificial dense neural network baseline.

Sparse Variational Contaminated Noise Gaussian Process Regression with Applications in Geomagnetic Perturbations Forecasting

TL;DR

Abstract

Paper Structure (21 sections, 56 equations, 10 figures, 9 tables, 1 algorithm)

This paper contains 21 sections, 56 equations, 10 figures, 9 tables, 1 algorithm.

Introduction
Background
Gaussian process regression
Sparse variational GPs (SVGP)
Methods
Model specification
Inference
Simulation Studies
Parameter estimation
Comparison with other robust likelihoods
Applications
Flight delays
Ground magnetic perturbations
Data Setup
Models
...and 6 more sections

Figures (10)

Figure 1: Raw $\delta B_H$ values for the Ottawa magnetometer.
Figure 2: Simulated data for our first simulation study. The true function from eq. (\ref{['eq:sim1-func']}) is shown in red. Noisy observations are plotted on the left.
Figure 3: Parameter estimates and ELBO value across iterations for a specific run. Red dashed lines show the true parameter value.
Figure 4: Boxplots for estimated parameter values from our proposed algorithm applied to 200 different simulated datasets. Red dashed lines show the true parameter value.
Figure 5: Function estimates from our proposed algorithm applied to 200 different simulated datasets. The true function from eq. (\ref{['eq:sim1-func']}) is shown in red. Estimated mean functions are shown in gray on the left. RMSEs for the estimated mean functions are given in the boxplot on the right.
...and 5 more figures

Sparse Variational Contaminated Noise Gaussian Process Regression with Applications in Geomagnetic Perturbations Forecasting

TL;DR

Abstract

Sparse Variational Contaminated Noise Gaussian Process Regression with Applications in Geomagnetic Perturbations Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (10)