Table of Contents
Fetching ...

N-Parties Private Structure and Parameter Learning for Sum-Product Networks

Xenia Heilmann, Ernst Althaus, Mattia Cerrato, Nick Johannes Peter Rassau, Mohammad Sadeq Dousti, Stefan Kramer

TL;DR

The paper tackles privacy-preserving density estimation by introducing an N-party secret-sharing protocol for jointly learning SPN structures and parameters, followed by private inference. It builds a forest of RAT-SPNs, trains and weights structures privately on distributed data, and learns sum and leaf parameters without exposing private data. Private inference is supported via secure multiparty computation over shared parameters and inputs. Empirical results show log-likelihood remains robust under privacy constraints and that the approach scales favorably compared to SMPC baselines for neural networks, with practical runtimes on modest datasets and party counts.

Abstract

A sum-product network (SPN) is a graphical model that allows several types of probabilistic inference to be performed efficiently. In this paper, we propose a privacy-preserving protocol which tackles structure generation and parameter learning of SPNs. Additionally, we provide a protocol for private inference on SPNs, subsequent to training. To preserve the privacy of the participants, we derive our protocol based on secret sharing, which guarantees privacy in the honest-but-curious setting even when at most half of the parties cooperate to disclose the data. The protocol makes use of a forest of randomly generated SPNs, which is trained and weighted privately and can then be used for private inference on data points. Our experiments indicate that preserving the privacy of all participants does not decrease log-likelihood performance on both homogeneously and heterogeneously partitioned data. We furthermore show that our protocol's performance is comparable to current state-of-the-art SPN learners in homogeneously partitioned data settings. In terms of runtime and memory usage, we demonstrate that our implementation scales well when increasing the number of parties, comparing favorably to protocols for neural networks, when they are trained to reproduce the input-output behavior of SPNs.

N-Parties Private Structure and Parameter Learning for Sum-Product Networks

TL;DR

The paper tackles privacy-preserving density estimation by introducing an N-party secret-sharing protocol for jointly learning SPN structures and parameters, followed by private inference. It builds a forest of RAT-SPNs, trains and weights structures privately on distributed data, and learns sum and leaf parameters without exposing private data. Private inference is supported via secure multiparty computation over shared parameters and inputs. Empirical results show log-likelihood remains robust under privacy constraints and that the approach scales favorably compared to SMPC baselines for neural networks, with practical runtimes on modest datasets and party counts.

Abstract

A sum-product network (SPN) is a graphical model that allows several types of probabilistic inference to be performed efficiently. In this paper, we propose a privacy-preserving protocol which tackles structure generation and parameter learning of SPNs. Additionally, we provide a protocol for private inference on SPNs, subsequent to training. To preserve the privacy of the participants, we derive our protocol based on secret sharing, which guarantees privacy in the honest-but-curious setting even when at most half of the parties cooperate to disclose the data. The protocol makes use of a forest of randomly generated SPNs, which is trained and weighted privately and can then be used for private inference on data points. Our experiments indicate that preserving the privacy of all participants does not decrease log-likelihood performance on both homogeneously and heterogeneously partitioned data. We furthermore show that our protocol's performance is comparable to current state-of-the-art SPN learners in homogeneously partitioned data settings. In terms of runtime and memory usage, we demonstrate that our implementation scales well when increasing the number of parties, comparing favorably to protocols for neural networks, when they are trained to reproduce the input-output behavior of SPNs.

Paper Structure

This paper contains 23 sections, 6 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of the whole privacy-preserving training and inference protocol: 1. Generation of the SPN forest. 2. Training of the structures on the local training sets. 3. Evaluation and weighting of the structures on the local validation sets. 4. Private sum parameter and leaf parameter training. 5. Private inference of local data points.
  • Figure 2: Visualization of the $t$-out-of-$n$ polynomial secret sharing scheme for secret $c_0$ and with a given prime $p$.
  • Figure 3: Comparison of the different non-private baselines with our privacy-preserving protocol. Mean log-likelihood performance of 10 runs on the bnetflix dataset with 3 members and a varying sum parameters on i.i.d. data and $Dir(10)$ partitioned data. The log-likelihoods of the four distributed partitions are all very close, only the non-private, non-distributed setting performs slightly better. Full tables can be found in the Appendix in Tables \ref{['tab:all-lls-iid']}, \ref{['tab:all-lls']} and \ref{['tab:all-lls-np']}.
  • Figure 4: Mean log-likelihood performance of 10 runs on the bnetflix dataset with 3 members and a varying number of sum weights on different dataset partitions. Here, the dataset partitions with a weak level of imbalance lie close to each other, while the log-likelihood decreases visibly for the other partitions. All experimental results can be found in the Appendix in Tables \ref{['tab:all-lls-iid']}, \ref{['tab:all-lls']} and \ref{['tab:all-lls-np']}.
  • Figure 5: Training with network traffic for nltcs and a varying amount of members (408 trainable parameters, 30 local training epochs) with one manager and a latency of 10 ms. Training times for CrypTen (401 trainable parameters, 30 training epochs) are given as comparison (also see Table \ref{['tab:time']} in the Appendix). Here, the CrypTen running times lie high above the results from our protocol.
  • ...and 2 more figures