Sparsity-Aware Distributed Learning for Gaussian Processes with Linear Multiple Kernel
Richard Cornelius Suwandi, Zhidi Lin, Feng Yin, Zhiguo Wang, Sergios Theodoridis
TL;DR
This work tackles scalable Gaussian process modeling for multi-dimensional data by introducing the GSMP kernel, a sparsity-promoting grid spectral mixture product that reduces hyper-parameters while preserving expressive power. It couples GSMP with SLIM-KL, a sparsity-aware distributed learning framework that uses a quantized ADMM to share global hyper-parameters and DSCA to solve local optimizations, enabling privacy-preserving, communication-efficient training. Theoretical results guarantee convergence of the DSCA within quantized ADMM and bound quantization error, while experiments show improved predictive performance and scalability across diverse datasets and long time series tasks. The approach significantly reduces model and computational complexity, demonstrates strong empirical accuracy, and offers practical benefits for large-scale, distributed GP applications in engineering contexts.
Abstract
Gaussian processes (GPs) stand as crucial tools in machine learning and signal processing, with their effectiveness hinging on kernel design and hyper-parameter optimization. This paper presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework to optimize the hyper-parameters. The newly proposed grid spectral mixture product (GSMP) kernel is tailored for multi-dimensional data, effectively reducing the number of hyper-parameters while maintaining good approximation capability. We further demonstrate that the associated hyper-parameter optimization of this kernel yields sparse solutions. To exploit the inherent sparsity of the solutions, we introduce the Sparse LInear Multiple Kernel Learning (SLIM-KL) framework. The framework incorporates a quantized alternating direction method of multipliers (ADMM) scheme for collaborative learning among multiple agents, where the local optimization problem is solved using a distributed successive convex approximation (DSCA) algorithm. SLIM-KL effectively manages large-scale hyper-parameter optimization for the proposed kernel, simultaneously ensuring data privacy and minimizing communication costs. Theoretical analysis establishes convergence guarantees for the learning framework, while experiments on diverse datasets demonstrate the superior prediction performance and efficiency of our proposed methods.
