Table of Contents
Fetching ...

Scaling Probabilistic Circuits via Data Partitioning

Jonas Seng, Florian Peter Busch, Pooja Prasad, Devendra Singh Dhami, Martin Mundt, Kristian Kersting

TL;DR

This work introduces Federated Circuits (FCs) as a principled framework to scale Probabilistic Circuits (PCs) through data partitioning, unifying horizontal, vertical, and hybrid federated learning (FL) within a single density-estimation view. A concrete instantiation, Federated PCs (FedPCs), uses tractable leaf density estimators and a one-pass training scheme to enable local learning with minimal communication, while network-side weights are inferred from client data. The approach achieves substantial speedups over centralized training and competitive or superior performance across density estimation and classification tasks in FL settings, demonstrating the practical viability of scalable, distributed probabilistic modeling. Overall, FCs offer a flexible, communication-efficient path to leverage large-scale distributed data for learning expressive probabilistic models with tractable inference capabilities.

Abstract

Probabilistic circuits (PCs) enable us to learn joint distributions over a set of random variables and to perform various probabilistic queries in a tractable fashion. Though the tractability property allows PCs to scale beyond non-tractable models such as Bayesian Networks, scaling training and inference of PCs to larger, real-world datasets remains challenging. To remedy the situation, we show how PCs can be learned across multiple machines by recursively partitioning a distributed dataset, thereby unveiling a deep connection between PCs and federated learning (FL). This leads to federated circuits (FCs) -- a novel and flexible federated learning (FL) framework that (1) allows one to scale PCs on distributed learning environments (2) train PCs faster and (3) unifies for the first time horizontal, vertical, and hybrid FL in one framework by re-framing FL as a density estimation problem over distributed datasets. We demonstrate FC's capability to scale PCs on various large-scale datasets. Also, we show FC's versatility in handling horizontal, vertical, and hybrid FL within a unified framework on multiple classification tasks.

Scaling Probabilistic Circuits via Data Partitioning

TL;DR

This work introduces Federated Circuits (FCs) as a principled framework to scale Probabilistic Circuits (PCs) through data partitioning, unifying horizontal, vertical, and hybrid federated learning (FL) within a single density-estimation view. A concrete instantiation, Federated PCs (FedPCs), uses tractable leaf density estimators and a one-pass training scheme to enable local learning with minimal communication, while network-side weights are inferred from client data. The approach achieves substantial speedups over centralized training and competitive or superior performance across density estimation and classification tasks in FL settings, demonstrating the practical viability of scalable, distributed probabilistic modeling. Overall, FCs offer a flexible, communication-efficient path to leverage large-scale distributed data for learning expressive probabilistic models with tractable inference capabilities.

Abstract

Probabilistic circuits (PCs) enable us to learn joint distributions over a set of random variables and to perform various probabilistic queries in a tractable fashion. Though the tractability property allows PCs to scale beyond non-tractable models such as Bayesian Networks, scaling training and inference of PCs to larger, real-world datasets remains challenging. To remedy the situation, we show how PCs can be learned across multiple machines by recursively partitioning a distributed dataset, thereby unveiling a deep connection between PCs and federated learning (FL). This leads to federated circuits (FCs) -- a novel and flexible federated learning (FL) framework that (1) allows one to scale PCs on distributed learning environments (2) train PCs faster and (3) unifies for the first time horizontal, vertical, and hybrid FL in one framework by re-framing FL as a density estimation problem over distributed datasets. We demonstrate FC's capability to scale PCs on various large-scale datasets. Also, we show FC's versatility in handling horizontal, vertical, and hybrid FL within a unified framework on multiple classification tasks.

Paper Structure

This paper contains 19 sections, 1 theorem, 3 equations, 5 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

Let $\tau_s$ be the total number of induced trees in $s$. Then the output at the root of $s$ can be written as $\sum_{t=1}^{\tau_s} \prod_{(k, j) \in \mathcal{T}_{t E}} w_{k j} \prod_{i=1}^n p_t(X_i = x_i)$, where $\mathcal{T}_t$ is the $t$-th unique induced tree of $s$ and $p_t(X_i)$ is a univariat

Figures (5)

  • Figure 1: Scaling PCs via Federated Circuits. We scale PCs by splitting a dataset $\mathcal{D}$ into a set of $n$ partitions $\{\mathcal{P}_i\}_{i=1}^n$ s.t. $\mathcal{D} = \bigcup_{i=1}^n \mathcal{P}_i$. Each partition is assigned to a client (i.e., machine) $c_j$, and the resulting federated circuit (FC) is learned jointly by a set of clients. As a novel framework for federated learning (FL), FCs can perform horizontal FL (samples are split across clients), vertical FL (features are split across clients), and hybrid FL (mix of horizontal and vertical).
  • Figure 2: One-Pass Training Visualized. (Top) First, the matrix $\mathbf{M}$ is initialized, representing which features are held by which client. Feature subsets are constructed by considering distinct column vectors $\mathbf{u}$ of $\mathbf{M}$ that represent the same set of clients. This forms a mapping indicating which features are modeled as a mixture over clients. (Bottom) This mapping is utilized by forming mixtures over different clients sharing the same feature set via sum nodes. Features that are not shared over multiple clients will be clustered into $K$ clusters (here $K=2$). The FedPC is formed by creating product nodes containing all sum nodes from the previous steps and at least one of the $K$ clusters. Lastly, the root node is inserted.
  • Figure 3: FedPCs speed up training on large-scale image data (64x64 and 32x32 RGB images) due to parallel training on separate data partitions.
  • Figure 4: FCs are competitive to prominent FL methods in all settings. FCs achieve competitive performance on various classification tasks compared to prominent horizontal/vertical FL baselines. FCs also handle the more challenging setting of hybrid FL without performance drops. We reported the F1 score (higher is better).
  • Figure 5: FedPCs are communication-efficient. We compare communication cost in Megabytes (MB) sent over the network during one full training of a model (0.5M/50M parameters) on a dataset (1M/100M samples) using results from Section 3.4. Results are shown on log-scale. It can be seen that FedPCs significantly reduce communication cost of training.

Theorems & Definitions (7)

  • Definition 1: Horizontal FL
  • Definition 2: Vertical FL
  • Definition 3: Federated Circuits
  • Definition 4: Federated PC
  • Definition 5
  • Proposition 1: Induced Tree Representation
  • proof