Table of Contents
Fetching ...

Privacy-aware Gaussian Process Regression

Rui Tuo, Haoyuan Chen, Raktim Bhattacharya

TL;DR

This work introduces privacy-aware Gaussian process regression by adding optimally designed correlated Gaussian noise to training data to guarantee a minimum predictive variance at sensitive inputs, avoiding divulging private information while preserving useful predictions elsewhere. The optimization of the noise covariance Σ is shown to be a semidefinite program (SDP), with an explicit closed-form PSD part in the finite-sensitive-input setting and a kernel-based extension that handles continuous privacy over infinite input regions via RKHS inner products. A kernel-based framework yields a uniformly privacy-aware solution that can be approximated by dense finite subsets, ensuring practical computation for continuous privacy regions. The method is demonstrated on a space-object tracking scenario and a real census dataset, demonstrating favorable privacy-utility tradeoffs and computational efficiency relative to differential-privacy baselines, with clear guidance on utility validation and scalability considerations.

Abstract

We propose a novel theoretical and methodological framework for Gaussian process regression subject to privacy constraints. The proposed method can be used when a data owner is unwilling to share a high-fidelity supervised learning model built from their data with the public due to privacy concerns. The key idea of the proposed method is to add synthetic noise to the data until the predictive variance of the Gaussian process model reaches a prespecified privacy level. The optimal covariance matrix of the synthetic noise is formulated in terms of semi-definite programming. We also introduce the formulation of privacy-aware solutions under continuous privacy constraints using kernel-based approaches, and study their theoretical properties. The proposed method is illustrated by considering a model that tracks the trajectories of satellites and a real application on a census dataset.

Privacy-aware Gaussian Process Regression

TL;DR

This work introduces privacy-aware Gaussian process regression by adding optimally designed correlated Gaussian noise to training data to guarantee a minimum predictive variance at sensitive inputs, avoiding divulging private information while preserving useful predictions elsewhere. The optimization of the noise covariance Σ is shown to be a semidefinite program (SDP), with an explicit closed-form PSD part in the finite-sensitive-input setting and a kernel-based extension that handles continuous privacy over infinite input regions via RKHS inner products. A kernel-based framework yields a uniformly privacy-aware solution that can be approximated by dense finite subsets, ensuring practical computation for continuous privacy regions. The method is demonstrated on a space-object tracking scenario and a real census dataset, demonstrating favorable privacy-utility tradeoffs and computational efficiency relative to differential-privacy baselines, with clear guidance on utility validation and scalability considerations.

Abstract

We propose a novel theoretical and methodological framework for Gaussian process regression subject to privacy constraints. The proposed method can be used when a data owner is unwilling to share a high-fidelity supervised learning model built from their data with the public due to privacy concerns. The key idea of the proposed method is to add synthetic noise to the data until the predictive variance of the Gaussian process model reaches a prespecified privacy level. The optimal covariance matrix of the synthetic noise is formulated in terms of semi-definite programming. We also introduce the formulation of privacy-aware solutions under continuous privacy constraints using kernel-based approaches, and study their theoretical properties. The proposed method is illustrated by considering a model that tracks the trajectories of satellites and a real application on a census dataset.
Paper Structure (12 sections, 3 theorems, 18 equations, 9 figures, 4 tables)

This paper contains 12 sections, 3 theorems, 18 equations, 9 figures, 4 tables.

Key Result

Theorem 1

Let ${B}$ be a symmetric matrix. Then ${B}^+$ is the only optimal point of the minimization problem

Figures (9)

  • Figure 1: Numerical results for Example \ref{['Sec:Example_1']}. Left: Covariance matrix of the synthetic noise of the proposed method. Middle: Comparison between the synthetic noise variances of the method using independent errors (referred to as the diagonal method) and the proposed method, in order to reach the same privacy level. Right: Comparison of the predictive variance of the diagonal method, the proposed method, and the unsecured method (i.e., the ordinary GP regression).
  • Figure 2: Trajectory of the space object defined by \ref{['satDyn']}. Private segments in the state trajectories are shown in red.
  • Figure 3: Privacy-aware GP regression for satellite dynamics. Shaded regions represent 95% confidence interval of the GP based on $\mu \pm 1.96 \sigma$.
  • Figure 4: GP predictions for satellite dynamics averaged over 100 replications. Green shaded regions represent 96% confidence bands of the proposed privacy-aware GP by removing the largest and smallest 2.0% of values across 100 independent simulations. Blue shaded regions represent 95% confidence interval of the non-private GP based on $\mu \pm 1.96 \sigma$ where $\mu$ and $\sigma^2$ are the predictive mean and variance of the non-private GP.
  • Figure 5: Mean Squared Error (MSE) between privacy-aware GP and non-private GP for satellite dynamics averaged over 100 replications. Shaded areas show 96% confidence bands of the MSE values derived from percentile bounds (2th--98th percentiles) of 100 simulation replications.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Example 1
  • Theorem 2
  • Theorem 3