Table of Contents
Fetching ...

Bayesian Pseudo Posterior Mechanism for Differentially Private Machine Learning

Robert Chew, Matthew R. Williams, Elan A. Segarra, Alexander J. Preiss, Amanda Konet, Terrance D. Savitsky

TL;DR

The paper addresses privacy-preserving machine learning in the presence of imbalanced data by introducing SWAG-PPM, a scalable mechanism that combines a risk-weighted pseudo-posterior with SWAG’s Gaussian posterior approximation. By downweighting high-risk records via per-record likelihood weights and sampling from an approximate posterior, SWAG-PPM achieves formal privacy guarantees while preserving utility much better than DP-SGD on a highly imbalanced text classification task drawn from OSHA data. The approach demonstrates modest utility loss relative to a non-private baseline and substantial improvement over DP-SGD, with a reweighting extension offering further gains. The work highlights practical implications for official statistics and other domains requiring private sharing of trained model parameters, while identifying finite-sample delta estimation as a key area for future refinement and global-DP extensions.

Abstract

Differential privacy (DP) is becoming increasingly important for deployed machine learning applications because it provides strong guarantees for protecting the privacy of individuals whose data is used to train models. However, DP mechanisms commonly used in machine learning tend to struggle on many real world distributions, including highly imbalanced or small labeled training sets. In this work, we propose a new scalable DP mechanism for deep learning models, SWAG-PPM, by using a pseudo posterior distribution that downweights by-record likelihood contributions proportionally to their disclosure risks as the randomized mechanism. As a motivating example from official statistics, we demonstrate SWAG-PPM on a workplace injury text classification task using a highly imbalanced public dataset published by the U.S. Occupational Safety and Health Administration (OSHA). We find that SWAG-PPM exhibits only modest utility degradation against a non-private comparator while greatly outperforming the industry standard DP-SGD for a similar privacy budget.

Bayesian Pseudo Posterior Mechanism for Differentially Private Machine Learning

TL;DR

The paper addresses privacy-preserving machine learning in the presence of imbalanced data by introducing SWAG-PPM, a scalable mechanism that combines a risk-weighted pseudo-posterior with SWAG’s Gaussian posterior approximation. By downweighting high-risk records via per-record likelihood weights and sampling from an approximate posterior, SWAG-PPM achieves formal privacy guarantees while preserving utility much better than DP-SGD on a highly imbalanced text classification task drawn from OSHA data. The approach demonstrates modest utility loss relative to a non-private baseline and substantial improvement over DP-SGD, with a reweighting extension offering further gains. The work highlights practical implications for official statistics and other domains requiring private sharing of trained model parameters, while identifying finite-sample delta estimation as a key area for future refinement and global-DP extensions.

Abstract

Differential privacy (DP) is becoming increasingly important for deployed machine learning applications because it provides strong guarantees for protecting the privacy of individuals whose data is used to train models. However, DP mechanisms commonly used in machine learning tend to struggle on many real world distributions, including highly imbalanced or small labeled training sets. In this work, we propose a new scalable DP mechanism for deep learning models, SWAG-PPM, by using a pseudo posterior distribution that downweights by-record likelihood contributions proportionally to their disclosure risks as the randomized mechanism. As a motivating example from official statistics, we demonstrate SWAG-PPM on a workplace injury text classification task using a highly imbalanced public dataset published by the U.S. Occupational Safety and Health Administration (OSHA). We find that SWAG-PPM exhibits only modest utility degradation against a non-private comparator while greatly outperforming the industry standard DP-SGD for a similar privacy budget.

Paper Structure

This paper contains 19 sections, 13 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: SWAG Pseudo Posterior Mechanism
  • Figure 2: Training Set Class Distribution
  • Figure 3: SWAG-PPM vs DP-SGD Utility by Class Size
  • Figure 4: Risk-based Weight Density Plot by Top and Bottom Class Size Quartiles

Theorems & Definitions (1)

  • Definition 1