Table of Contents
Fetching ...

Sparsity-Aware Distributed Learning for Gaussian Processes with Linear Multiple Kernel

Richard Cornelius Suwandi, Zhidi Lin, Feng Yin, Zhiguo Wang, Sergios Theodoridis

TL;DR

This work tackles scalable Gaussian process modeling for multi-dimensional data by introducing the GSMP kernel, a sparsity-promoting grid spectral mixture product that reduces hyper-parameters while preserving expressive power. It couples GSMP with SLIM-KL, a sparsity-aware distributed learning framework that uses a quantized ADMM to share global hyper-parameters and DSCA to solve local optimizations, enabling privacy-preserving, communication-efficient training. Theoretical results guarantee convergence of the DSCA within quantized ADMM and bound quantization error, while experiments show improved predictive performance and scalability across diverse datasets and long time series tasks. The approach significantly reduces model and computational complexity, demonstrates strong empirical accuracy, and offers practical benefits for large-scale, distributed GP applications in engineering contexts.

Abstract

Gaussian processes (GPs) stand as crucial tools in machine learning and signal processing, with their effectiveness hinging on kernel design and hyper-parameter optimization. This paper presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework to optimize the hyper-parameters. The newly proposed grid spectral mixture product (GSMP) kernel is tailored for multi-dimensional data, effectively reducing the number of hyper-parameters while maintaining good approximation capability. We further demonstrate that the associated hyper-parameter optimization of this kernel yields sparse solutions. To exploit the inherent sparsity of the solutions, we introduce the Sparse LInear Multiple Kernel Learning (SLIM-KL) framework. The framework incorporates a quantized alternating direction method of multipliers (ADMM) scheme for collaborative learning among multiple agents, where the local optimization problem is solved using a distributed successive convex approximation (DSCA) algorithm. SLIM-KL effectively manages large-scale hyper-parameter optimization for the proposed kernel, simultaneously ensuring data privacy and minimizing communication costs. Theoretical analysis establishes convergence guarantees for the learning framework, while experiments on diverse datasets demonstrate the superior prediction performance and efficiency of our proposed methods.

Sparsity-Aware Distributed Learning for Gaussian Processes with Linear Multiple Kernel

TL;DR

This work tackles scalable Gaussian process modeling for multi-dimensional data by introducing the GSMP kernel, a sparsity-promoting grid spectral mixture product that reduces hyper-parameters while preserving expressive power. It couples GSMP with SLIM-KL, a sparsity-aware distributed learning framework that uses a quantized ADMM to share global hyper-parameters and DSCA to solve local optimizations, enabling privacy-preserving, communication-efficient training. Theoretical results guarantee convergence of the DSCA within quantized ADMM and bound quantization error, while experiments show improved predictive performance and scalability across diverse datasets and long time series tasks. The approach significantly reduces model and computational complexity, demonstrates strong empirical accuracy, and offers practical benefits for large-scale, distributed GP applications in engineering contexts.

Abstract

Gaussian processes (GPs) stand as crucial tools in machine learning and signal processing, with their effectiveness hinging on kernel design and hyper-parameter optimization. This paper presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework to optimize the hyper-parameters. The newly proposed grid spectral mixture product (GSMP) kernel is tailored for multi-dimensional data, effectively reducing the number of hyper-parameters while maintaining good approximation capability. We further demonstrate that the associated hyper-parameter optimization of this kernel yields sparse solutions. To exploit the inherent sparsity of the solutions, we introduce the Sparse LInear Multiple Kernel Learning (SLIM-KL) framework. The framework incorporates a quantized alternating direction method of multipliers (ADMM) scheme for collaborative learning among multiple agents, where the local optimization problem is solved using a distributed successive convex approximation (DSCA) algorithm. SLIM-KL effectively manages large-scale hyper-parameter optimization for the proposed kernel, simultaneously ensuring data privacy and minimizing communication costs. Theoretical analysis establishes convergence guarantees for the learning framework, while experiments on diverse datasets demonstrate the superior prediction performance and efficiency of our proposed methods.
Paper Structure (24 sections, 6 theorems, 38 equations, 5 figures, 9 tables, 1 algorithm)

This paper contains 24 sections, 6 theorems, 38 equations, 5 figures, 9 tables, 1 algorithm.

Key Result

Theorem 1

The spectral density of the GSMP kernel, defined in Eq. eq:gsmp, is a Gaussian mixture given by,

Figures (5)

  • Figure 1: The proposed Sparse Linear Multiple Kernel Learning (SLIM-KL) framework, featuring a quantized ADMM scheme for collaborative hyper-parameter learning across multiple agents, and a distributed SCA algorithm for local optimization using multi-core computing units.
  • Figure 2: The learned spectral density of multi-dimensional GSM kernel (left) versus the learned spectral density of GSMP kernel (right). The cross symbols represent the modes of the ground truth.
  • Figure 3: Total computation time (in log-scale) for one computing unit, with respect to different values of $s$.
  • Figure 4: Performance comparison of SLIM-KL under stochastic quantization versus deterministic quantization, with $\Delta = 0.01$. The blue dots with vertical error bars indicate the mean MSE plus-minus two standard deviations when using the stochastic quantization, while the red dashed lines represent the MSE when using the deterministic quantization.
  • Figure 5: Average saving ratio in transmitting the local hyper-parameters when using quantization versus without quantization, with respect to different quantization resolution $\Delta$.

Theorems & Definitions (18)

  • Theorem 1
  • proof
  • Example 1
  • Theorem 2
  • proof
  • Remark 1
  • Remark 2
  • Theorem 3
  • proof
  • Theorem 4
  • ...and 8 more