Table of Contents
Fetching ...

Automated Privacy-Preserving Techniques via Meta-Learning

Tânia Carvalho, Nuno Moniz, Luís Antunes

TL;DR

The paper addresses the challenge of privacy-preserving data sharing under strict regulations by automating PPT configuration discovery. It introduces AUTOPRIV, a three-phase meta-learning framework that automates de-identification through protection, development, and prediction, using twin meta-models to forecast predictive performance ($AUC$) and privacy risk (linkability) across a large solution space and produce ranked PPT configurations. The authors demonstrate that bandit-based optimisation (successive halving and hyperband) offers favorable performance-privacy-velocity trade-offs, with $\,\epsilon$-PrivateSMOTE showing robust utility while maintaining low privacy risk. The work reduces computational costs and expert requirements, enabling non-experts to apply privacy-preserving transformations to new data sets, and provides open-source code to reproduce and extend the approach.

Abstract

Sharing private data for learning tasks is pivotal for transparent and secure machine learning applications. Many privacy-preserving techniques have been proposed for this task aiming to transform the data while ensuring the privacy of individuals. Some of these techniques have been incorporated into tools, whereas others are accessed through various online platforms. However, such tools require manual configuration, which can be complex and time-consuming. Moreover, they require substantial expertise, potentially restricting their use to those with advanced technical knowledge. In this paper, we propose AUTOPRIV, the first automated privacy-preservation method, that eliminates the need for any manual configuration. AUTOPRIV employs meta-learning to automate the de-identification process, facilitating the secure release of data for machine learning tasks. The main goal is to anticipate the predictive performance and privacy risk of a large set of privacy configurations. We provide a ranked list of the most promising solutions, which are likely to achieve an optimal approximation within a new domain. AUTOPRIV is highly effective as it reduces computational complexity and energy consumption considerably.

Automated Privacy-Preserving Techniques via Meta-Learning

TL;DR

The paper addresses the challenge of privacy-preserving data sharing under strict regulations by automating PPT configuration discovery. It introduces AUTOPRIV, a three-phase meta-learning framework that automates de-identification through protection, development, and prediction, using twin meta-models to forecast predictive performance () and privacy risk (linkability) across a large solution space and produce ranked PPT configurations. The authors demonstrate that bandit-based optimisation (successive halving and hyperband) offers favorable performance-privacy-velocity trade-offs, with -PrivateSMOTE showing robust utility while maintaining low privacy risk. The work reduces computational costs and expert requirements, enabling non-experts to apply privacy-preserving transformations to new data sets, and provides open-source code to reproduce and extend the approach.

Abstract

Sharing private data for learning tasks is pivotal for transparent and secure machine learning applications. Many privacy-preserving techniques have been proposed for this task aiming to transform the data while ensuring the privacy of individuals. Some of these techniques have been incorporated into tools, whereas others are accessed through various online platforms. However, such tools require manual configuration, which can be complex and time-consuming. Moreover, they require substantial expertise, potentially restricting their use to those with advanced technical knowledge. In this paper, we propose AUTOPRIV, the first automated privacy-preservation method, that eliminates the need for any manual configuration. AUTOPRIV employs meta-learning to automate the de-identification process, facilitating the secure release of data for machine learning tasks. The main goal is to anticipate the predictive performance and privacy risk of a large set of privacy configurations. We provide a ranked list of the most promising solutions, which are likely to achieve an optimal approximation within a new domain. AUTOPRIV is highly effective as it reduces computational complexity and energy consumption considerably.

Paper Structure

This paper contains 24 sections, 2 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Illustration of the AUTOPRIV method. In the protection phase, a base of original data sets $\mathscr{D}$ is transformed by a synthesis model $G$, generating multiple protected data variants. Each variant is used in the development phase to generate a twin meta-model, $\mathcal{M}$ and $\mathcal{L}$. Both models use the meta-feature description of $S_{i_j}$ ($j$ variant of the original data set $D_i$), but $\mathcal{M}$ includes the best performance within the learning configurations ($\Psi$) for all privacy configurations ($\mathcal{G}$), while $\mathcal{L}$ includes the privacy risk. In the prediction phase, we extract the meta-features of a new data set and its cross-product of $\mathcal{G}$ to use as a predictor set for the meta-models $\mathcal{M}$ and $\mathcal{L}$.
  • Figure 2: Best predictive performance results (top) and corresponding privacy risk (bottom) for all hyperparameter optimisation approaches.
  • Figure 3: Runtime of the hyperparameter optimisation strategies.
  • Figure 4: Comparison between the competing optimisation strategies and the oracle configuration. Shows the probability that each candidate solution significantly loses, draws or wins against the oracle according to the Bayes Sign Test for predictive performance.
  • Figure 5: Comparison between the best-estimated hyperparameter configuration for each transformation technique obtained through successive halving, alongside the best possible model derived from this process. Shows the probability that each candidate solution significantly draws or loses according to the Bayes Sign Test for predictive performance.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Definition 1: Highest-risk selection
  • Definition 2: Linkability giomi2022unified