Robust and Computation-Aware Gaussian Processes
Marshal Arijona Sinaga, Julien Martinelli, Samuel Kaski
TL;DR
Gaussian Processes often struggle with scalability and robustness in large, outlier-contaminated data. The authors propose Robust Computation-Aware Gaussian Processes ($RCaGP$), which unifies robust-conjugate GP inference with computation-aware approximations, yielding conservative uncertainty and improved reliability. Theoretical results show robustness to outliers through a bounded posterior influence function and worst-case error guarantees, while an end-to-end optimization via $EULBO$ enables joint model and acquisition design. Empirically, $RCaGP$ outperforms baselines across regression and high-throughput BO tasks, and an expert-guided robust mean prior further enhances performance, signaling a practical impact for reliable large-scale probabilistic inference and optimization.
Abstract
Gaussian processes (GPs) are widely used for regression and optimization tasks such as Bayesian optimization (BO) due to their expressiveness and principled uncertainty estimates. However, in settings with large datasets corrupted by outliers, standard GPs and their sparse approximations struggle with computational tractability and robustness. We introduce Robust Computation-aware Gaussian Process (RCaGP), a novel GP model that jointly addresses these challenges by combining a principled treatment of approximation-induced uncertainty with robust generalized Bayesian updating. The key insight is that robustness and approximation-awareness are not orthogonal but intertwined: approximations can exacerbate the impact of outliers, and mitigating one without the other is insufficient. Unlike previous work that focuses narrowly on either robustness or approximation quality, RCaGP combines both in a principled and scalable framework, thus effectively managing both outliers and computational uncertainties introduced by approximations such as low-rank matrix multiplications. Our model ensures more conservative and reliable uncertainty estimates, a property we rigorously demonstrate. Additionally, we establish a robustness property and show that the mean function is key to preserving it, motivating a tailored model selection scheme for robust mean functions. Empirical results confirm that solving these challenges jointly leads to superior performance across both clean and outlier-contaminated settings, both on regression and high-throughput Bayesian optimization benchmarks.
