Parallel Gaussian process with kernel approximation in CUDA

Davide Carminati

Parallel Gaussian process with kernel approximation in CUDA

Davide Carminati

TL;DR

A parallel implementation in CUDA/C++ of the Gaussian process with a decomposed kernel relies on parallelizing the computation of the predictive posterior statistics on a GPU using CUDA and its libraries.

Abstract

This paper introduces a parallel implementation in CUDA/C++ of the Gaussian process with a decomposed kernel. This recent formulation, introduced by Joukov and Kulić (2022), is characterized by an approximated -- but much smaller -- matrix to be inverted compared to plain Gaussian process. However, it exhibits a limitation when dealing with higher-dimensional samples which degrades execution times. The solution presented in this paper relies on parallelizing the computation of the predictive posterior statistics on a GPU using CUDA and its libraries. The CPU code and GPU code are then benchmarked on different CPU-GPU configurations to show the benefits of the parallel implementation on GPU over the CPU.

Parallel Gaussian process with kernel approximation in CUDA

TL;DR

Abstract

Paper Structure (9 sections, 18 equations, 1 figure, 1 table)

This paper contains 9 sections, 18 equations, 1 figure, 1 table.

Introduction
Theoretical background
Gaussian process regression
Gaussian process regression with a decomposed kernel
Kernel decomposition
Implementation details
Results and discussion
Conclusion
Computational details

Figures (1)

Figure 1: Execution times when varying the number of dimensions $p$ and eigenvalues $n$

Parallel Gaussian process with kernel approximation in CUDA

TL;DR

Abstract

Parallel Gaussian process with kernel approximation in CUDA

Authors

TL;DR

Abstract

Table of Contents

Figures (1)