Power Mechanism: Private Tabular Representation Release for Model Agnostic Consumption
Praneeth Vepakomma, Kaustubh Ponkshe
TL;DR
Power Mechanism addresses the lack of formal privacy guarantees for sharing embeddings by learning a privatization network that converts sensitive data into private embeddings, jointly trained with a lightweight utility network. The embeddings are equipped with formal DP guarantees via Lipschitz privacy and a calibration process to $(\epsilon,\delta)$, enabling a single round of communication to a powerful, model-agnostic server. The server can apply any standard ML method (e.g., neural nets, random forests, XGBoost) to the embeddings while preserving privacy, and the approach is shown to improve privacy-utility trade-offs and resource efficiency versus weight-based DP baselines. Empirical results on tabular datasets (Forest Cover, Higgs, Census) demonstrate reduced client compute, robust defense against reconstruction attacks, and favorable convergence of privacy and utility losses, highlighting practical impact for privacy-preserving collaborative learning in real-world, tabular-data scenarios.
Abstract
Traditional collaborative learning approaches are based on sharing of model weights between clients and a server. However, there are advantages to resource efficiency through schemes based on sharing of embeddings (activations) created from the data. Several differentially private methods were developed for sharing of weights while such mechanisms do not exist so far for sharing of embeddings. We propose Ours to learn a privacy encoding network in conjunction with a small utility generation network such that the final embeddings generated from it are equipped with formal differential privacy guarantees. These privatized embeddings are then shared with a more powerful server, that learns a post-processing that results in a higher accuracy for machine learning tasks. We show that our co-design of collaborative and private learning results in requiring only one round of privatized communication and lesser compute on the client than traditional methods. The privatized embeddings that we share from the client are agnostic to the type of model (deep learning, random forests or XGBoost) used on the server in order to process these activations to complete a task.
