Table of Contents
Fetching ...

Federated Neuro-Symbolic Learning

Pengwei Xing, Songtao Lu, Han Yu

TL;DR

This work extends neuro-symbolic learning to federated settings by modeling rule distributions as the communication medium between server and clients. It introduces a distribution-coupled bilevel optimization framework, solved with a tailored variational EM that jointly learns a global prior over rules and local posteriors, while enforcing a KL-divergence constraint to mitigate rule heterogeneity across domains. The approach leverages a transformer-based rule generator and a KG-aware E-step to efficiently search and score candidate rules, substantially reducing the rule-search space. Empirical results on synthetic and real-world data demonstrate notable gains in unbalanced training accuracy and unseen testing accuracy, validating FedNSL’s ability to perform personalized NSL under FL while preserving privacy and improving cross-domain generalization.

Abstract

Neuro-symbolic learning (NSL) models complex symbolic rule patterns into latent variable distributions by neural networks, which reduces rule search space and generates unseen rules to improve downstream task performance. Centralized NSL learning involves directly acquiring data from downstream tasks, which is not feasible for federated learning (FL). To address this limitation, we shift the focus from such a one-to-one interactive neuro-symbolic paradigm to one-to-many Federated Neuro-Symbolic Learning framework (FedNSL) with latent variables as the FL communication medium. Built on the basis of our novel reformulation of the NSL theory, FedNSL is capable of identifying and addressing rule distribution heterogeneity through a simple and effective Kullback-Leibler (KL) divergence constraint on rule distribution applicable under the FL setting. It further theoretically adjusts variational expectation maximization (V-EM) to reduce the rule search space across domains. This is the first incorporation of distribution-coupled bilevel optimization into FL. Extensive experiments based on both synthetic and real-world data demonstrate significant advantages of FedNSL compared to five state-of-the-art methods. It outperforms the best baseline by 17% and 29% in terms of unbalanced average training accuracy and unseen average testing accuracy, respectively.

Federated Neuro-Symbolic Learning

TL;DR

This work extends neuro-symbolic learning to federated settings by modeling rule distributions as the communication medium between server and clients. It introduces a distribution-coupled bilevel optimization framework, solved with a tailored variational EM that jointly learns a global prior over rules and local posteriors, while enforcing a KL-divergence constraint to mitigate rule heterogeneity across domains. The approach leverages a transformer-based rule generator and a KG-aware E-step to efficiently search and score candidate rules, substantially reducing the rule-search space. Empirical results on synthetic and real-world data demonstrate notable gains in unbalanced training accuracy and unseen testing accuracy, validating FedNSL’s ability to perform personalized NSL under FL while preserving privacy and improving cross-domain generalization.

Abstract

Neuro-symbolic learning (NSL) models complex symbolic rule patterns into latent variable distributions by neural networks, which reduces rule search space and generates unseen rules to improve downstream task performance. Centralized NSL learning involves directly acquiring data from downstream tasks, which is not feasible for federated learning (FL). To address this limitation, we shift the focus from such a one-to-one interactive neuro-symbolic paradigm to one-to-many Federated Neuro-Symbolic Learning framework (FedNSL) with latent variables as the FL communication medium. Built on the basis of our novel reformulation of the NSL theory, FedNSL is capable of identifying and addressing rule distribution heterogeneity through a simple and effective Kullback-Leibler (KL) divergence constraint on rule distribution applicable under the FL setting. It further theoretically adjusts variational expectation maximization (V-EM) to reduce the rule search space across domains. This is the first incorporation of distribution-coupled bilevel optimization into FL. Extensive experiments based on both synthetic and real-world data demonstrate significant advantages of FedNSL compared to five state-of-the-art methods. It outperforms the best baseline by 17% and 29% in terms of unbalanced average training accuracy and unseen average testing accuracy, respectively.
Paper Structure (34 sections, 4 theorems, 24 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 34 sections, 4 theorems, 24 equations, 5 figures, 2 tables, 2 algorithms.

Key Result

Lemma 3.1

Given that $z_i, \forall i$ are i.i.d. with $\bar{z}$, the overall log-likelihood function $\log \left( p_{w_{1:n},\theta}(a_{1:n}|q_{1:n},\mathcal{G}_{1:n}) \right)$ can be rewritten as where $\mathcal{L}_{\mathrm{ELBO}}(\tilde{p}(\bar{z}),p_{\theta,w_{1:n}}(\bar{z}))$ is the evidence lower bound (ELBO) of the log-likelihood function, and $D_{\mathrm{KL}}(\tilde{p}(z_i)||p_{w_{i},\theta}(z_i|q_{

Figures (5)

  • Figure 1: A neuro-symbolic PFL example (a) and a corresponding KG-based rule learning scenario (b). The example (a) illustrates how a global rule generator and multiple personalized rule scorers cooperate to tackle rule personalization without exposing local data by transmitting rule distribution probabilities. Meanwhile, the KG-based workflow (b) demonstrates the V-EM mechanism, employing maximization of $\mathrm{ELMO}$ and minimization of $\mathrm{KL}$-divergence constraint-1 (KL-DC1) for inductive rule reference (blue part in (b)). Additionally, it incorporates a $\mathrm{KL}$-divergence constraint-2 (KL-DC2) to diminish rule heterogeneity (orange part in (b)).
  • Figure 2: Group (a) presents the numerical experiment results. The first row features (a1), (a2) and (a3), which respectively show the training accuracy of the classifiers for client 1, client 2 and the average results. The second row features (a4), (a5) and (a6), which respectively show the unseen testing accuracy for the classifiers of client 1, client 2 and the average results. The third row shows performance comparison results under different ratios of training-testing data heterogeneity: "0% (homo)" means training and testing data have the same distribution, while "33% (hetero)" and "50% (hetero)" indicate that 33% and 50% of the unseen testing data, respectively, follow a different distribution from the training data. Group (b) shows the real-data experiment results, including F1-scores in (b1), logic accuracy in (b2) on both the unseen and seen testing data with and without KL-divergence rule distribution constraints (denoted by "W/O. KL" and "W. KL"), and (b3) illustrates how different coefficients of KL-divergence constraint affect the personalization performance.
  • Figure 3: Path-based score function has a better effect on reducing the fluctuations than the graph-based score function.
  • Figure 4: Different upper-level first-round training loss when adding different rates of posterior sample.
  • Figure 5: F1 and Logic Acc curves under different rule generator’ learning rates.

Theorems & Definitions (8)

  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Lemma 3.4
  • proof
  • proof
  • proof
  • proof