Table of Contents
Fetching ...

Structure Selection for Fairness-Constrained Differentially Private Data Synthesis

Naeim Ghahramanpour, Mostafa Milani

Abstract

Differential privacy (DP) enables safe data release, with synthetic data generation emerging as a common approach in recent years. Yet standard synthesizers preserve all dependencies in the data, including spurious correlations between sensitive attributes and outcomes. In fairness-critical settings, this reproduces unwanted bias. A principled remedy is to enforce conditional independence (CI) constraints, which encode domain knowledge or legal requirements that outcomes be independent of sensitive attributes once admissible factors are accounted for. DP synthesis typically proceeds in two phases: (i) a measure- ment step that privatizes selected marginals, often structured via minimum spanning trees (MSTs), and (ii) a reconstruction step that fits a probabilistic model consistent with the noisy marginals. We propose PrivCI, which enforces CI during the measurement step via a CI-aware greedy MST algorithm that integrates feasibility checks into Kruskal's construction under the exponential mechanism, improving accuracy over competing methods. Experiments on standard fairness benchmarks show that PrivCI achieves stronger fidelity and predictive accuracy than prior baselines while satisfying the specified CI constraints.

Structure Selection for Fairness-Constrained Differentially Private Data Synthesis

Abstract

Differential privacy (DP) enables safe data release, with synthetic data generation emerging as a common approach in recent years. Yet standard synthesizers preserve all dependencies in the data, including spurious correlations between sensitive attributes and outcomes. In fairness-critical settings, this reproduces unwanted bias. A principled remedy is to enforce conditional independence (CI) constraints, which encode domain knowledge or legal requirements that outcomes be independent of sensitive attributes once admissible factors are accounted for. DP synthesis typically proceeds in two phases: (i) a measure- ment step that privatizes selected marginals, often structured via minimum spanning trees (MSTs), and (ii) a reconstruction step that fits a probabilistic model consistent with the noisy marginals. We propose PrivCI, which enforces CI during the measurement step via a CI-aware greedy MST algorithm that integrates feasibility checks into Kruskal's construction under the exponential mechanism, improving accuracy over competing methods. Experiments on standard fairness benchmarks show that PrivCI achieves stronger fidelity and predictive accuracy than prior baselines while satisfying the specified CI constraints.
Paper Structure (22 sections, 15 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 15 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Total proxy MI score $q$ of selected tree structures.
  • Figure 2: Distributional fidelity measured by KL divergence and TV distance.
  • Figure 3: Downstream predictive performance (AUC).
  • Figure 4: Conditional mutual information (CMI).
  • Figure 5: Equzlied Odds (OD).

Theorems & Definitions (1)

  • Definition 1: DP Data Synthesis with CI Constraints