Automated Privacy-Preserving Techniques via Meta-Learning
Tânia Carvalho, Nuno Moniz, Luís Antunes
TL;DR
The paper addresses the challenge of privacy-preserving data sharing under strict regulations by automating PPT configuration discovery. It introduces AUTOPRIV, a three-phase meta-learning framework that automates de-identification through protection, development, and prediction, using twin meta-models to forecast predictive performance ($AUC$) and privacy risk (linkability) across a large solution space and produce ranked PPT configurations. The authors demonstrate that bandit-based optimisation (successive halving and hyperband) offers favorable performance-privacy-velocity trade-offs, with $\,\epsilon$-PrivateSMOTE showing robust utility while maintaining low privacy risk. The work reduces computational costs and expert requirements, enabling non-experts to apply privacy-preserving transformations to new data sets, and provides open-source code to reproduce and extend the approach.
Abstract
Sharing private data for learning tasks is pivotal for transparent and secure machine learning applications. Many privacy-preserving techniques have been proposed for this task aiming to transform the data while ensuring the privacy of individuals. Some of these techniques have been incorporated into tools, whereas others are accessed through various online platforms. However, such tools require manual configuration, which can be complex and time-consuming. Moreover, they require substantial expertise, potentially restricting their use to those with advanced technical knowledge. In this paper, we propose AUTOPRIV, the first automated privacy-preservation method, that eliminates the need for any manual configuration. AUTOPRIV employs meta-learning to automate the de-identification process, facilitating the secure release of data for machine learning tasks. The main goal is to anticipate the predictive performance and privacy risk of a large set of privacy configurations. We provide a ranked list of the most promising solutions, which are likely to achieve an optimal approximation within a new domain. AUTOPRIV is highly effective as it reduces computational complexity and energy consumption considerably.
