Noise-Aware Differentially Private Regression via Meta-Learning
Ossi Räisä, Stratis Markou, Matthew Ashman, Wessel P. Bruinsma, Marlon Tobaben, Antti Honkela, Richard E. Turner
TL;DR
The paper introduces DPConvCNP, a meta-learned regression model that embeds a differential privacy mechanism inside a ConvCNP via a functional DP mechanism, enabling noise-aware predictions with calibrated uncertainty in small-data regimes. It tightens functional-DP bounds using Gaussian DP theory and integrates a DPSetConv encoder to protect context data, delivering $(\epsilon,\delta)$-DP guarantees for meta-testing. Empirically, DPConvCNP matches or surpasses a carefully tuned DP Gaussian Process baseline on Gaussian and non-Gaussian tasks while being significantly faster at test time, and demonstrates strong sim-to-real performance under privacy constraints. The work also enables amortisation over privacy budgets and shows robust calibration for real-world, privacy-preserving regression tasks, with limitations including modeling dependencies among targets and reliance on simulator diversity for sim-to-real transfer.
Abstract
Many high-stakes applications require machine learning models that protect user privacy and provide well-calibrated, accurate predictions. While Differential Privacy (DP) is the gold standard for protecting user privacy, standard DP mechanisms typically significantly impair performance. One approach to mitigating this issue is pre-training models on simulated data before DP learning on the private data. In this work we go a step further, using simulated data to train a meta-learning model that combines the Convolutional Conditional Neural Process (ConvCNP) with an improved functional DP mechanism of Hall et al. [2013] yielding the DPConvCNP. DPConvCNP learns from simulated data how to map private data to a DP predictive model in one forward pass, and then provides accurate, well-calibrated predictions. We compare DPConvCNP with a DP Gaussian Process (GP) baseline with carefully tuned hyperparameters. The DPConvCNP outperforms the GP baseline, especially on non-Gaussian data, yet is much faster at test time and requires less tuning.
