Table of Contents
Fetching ...

Power Mechanism: Private Tabular Representation Release for Model Agnostic Consumption

Praneeth Vepakomma, Kaustubh Ponkshe

TL;DR

Power Mechanism addresses the lack of formal privacy guarantees for sharing embeddings by learning a privatization network that converts sensitive data into private embeddings, jointly trained with a lightweight utility network. The embeddings are equipped with formal DP guarantees via Lipschitz privacy and a calibration process to $(\epsilon,\delta)$, enabling a single round of communication to a powerful, model-agnostic server. The server can apply any standard ML method (e.g., neural nets, random forests, XGBoost) to the embeddings while preserving privacy, and the approach is shown to improve privacy-utility trade-offs and resource efficiency versus weight-based DP baselines. Empirical results on tabular datasets (Forest Cover, Higgs, Census) demonstrate reduced client compute, robust defense against reconstruction attacks, and favorable convergence of privacy and utility losses, highlighting practical impact for privacy-preserving collaborative learning in real-world, tabular-data scenarios.

Abstract

Traditional collaborative learning approaches are based on sharing of model weights between clients and a server. However, there are advantages to resource efficiency through schemes based on sharing of embeddings (activations) created from the data. Several differentially private methods were developed for sharing of weights while such mechanisms do not exist so far for sharing of embeddings. We propose Ours to learn a privacy encoding network in conjunction with a small utility generation network such that the final embeddings generated from it are equipped with formal differential privacy guarantees. These privatized embeddings are then shared with a more powerful server, that learns a post-processing that results in a higher accuracy for machine learning tasks. We show that our co-design of collaborative and private learning results in requiring only one round of privatized communication and lesser compute on the client than traditional methods. The privatized embeddings that we share from the client are agnostic to the type of model (deep learning, random forests or XGBoost) used on the server in order to process these activations to complete a task.

Power Mechanism: Private Tabular Representation Release for Model Agnostic Consumption

TL;DR

Power Mechanism addresses the lack of formal privacy guarantees for sharing embeddings by learning a privatization network that converts sensitive data into private embeddings, jointly trained with a lightweight utility network. The embeddings are equipped with formal DP guarantees via Lipschitz privacy and a calibration process to , enabling a single round of communication to a powerful, model-agnostic server. The server can apply any standard ML method (e.g., neural nets, random forests, XGBoost) to the embeddings while preserving privacy, and the approach is shown to improve privacy-utility trade-offs and resource efficiency versus weight-based DP baselines. Empirical results on tabular datasets (Forest Cover, Higgs, Census) demonstrate reduced client compute, robust defense against reconstruction attacks, and favorable convergence of privacy and utility losses, highlighting practical impact for privacy-preserving collaborative learning in real-world, tabular-data scenarios.

Abstract

Traditional collaborative learning approaches are based on sharing of model weights between clients and a server. However, there are advantages to resource efficiency through schemes based on sharing of embeddings (activations) created from the data. Several differentially private methods were developed for sharing of weights while such mechanisms do not exist so far for sharing of embeddings. We propose Ours to learn a privacy encoding network in conjunction with a small utility generation network such that the final embeddings generated from it are equipped with formal differential privacy guarantees. These privatized embeddings are then shared with a more powerful server, that learns a post-processing that results in a higher accuracy for machine learning tasks. We show that our co-design of collaborative and private learning results in requiring only one round of privatized communication and lesser compute on the client than traditional methods. The privatized embeddings that we share from the client are agnostic to the type of model (deep learning, random forests or XGBoost) used on the server in order to process these activations to complete a task.

Paper Structure

This paper contains 41 sections, 7 theorems, 99 equations, 10 figures, 8 tables, 1 algorithm.

Key Result

Proposition 1

For any $\lambda > 0$, an $\epsilon$-Lipschitz private mechanism $\mathcal{M}$ is $(\epsilon\lambda)$-differentially private under adjacency relation $\mathcal{A}$.

Figures (10)

  • Figure 1: Schematic illustration of the Ours for distributed and private learning, that theoretically calibrates and measures the obtained level of $\epsilon$ and $\delta$ for differential privacy. This calibration is done after the minimization of a specifically proposed privacy loss that is minimized in regularization with the machine learning utility loss.
  • Figure 2: The interactions allow the server to use several machine learning methods, making the system private and fairly model agnostic.
  • Figure 3: Train Histogram of $\epsilon$ for PowerLearn Embeddings
  • Figure 4: Train Histogram of $\epsilon$ for PowerLearn Embeddings
  • Figure 5: Convergence of privacy and utility losses on client model
  • ...and 5 more figures

Theorems & Definitions (25)

  • Definition 1: $\epsilon$-Differential Privacy dwork2014algorithmic
  • Definition 2: $(\epsilon,\delta)$-Differential Privacy
  • Definition 3: Lipschitz Privacy
  • Definition 4: Local Lipschitz Privacy
  • Proposition 1
  • proof
  • Theorem 1: Equivalence of privacy: Gradient $\ell_2$ bound implies Lipschitz privacy
  • proof
  • Theorem 2
  • proof
  • ...and 15 more