Federated Causal Discovery from Heterogeneous Data

Loka Li; Ignavier Ng; Gongxu Luo; Biwei Huang; Guangyi Chen; Tongliang Liu; Bin Gu; Kun Zhang

Federated Causal Discovery from Heterogeneous Data

Loka Li, Ignavier Ng, Gongxu Luo, Biwei Huang, Guangyi Chen, Tongliang Liu, Bin Gu, Kun Zhang

TL;DR

The paper tackles federated causal discovery under heterogeneous, decentralized data where standard centralized approaches fail. It introduces FedCDH, a nonparametric, privacy-preserving framework that leverages a surrogate domain variable to model distribution shifts and two tasks—FCIT for skeleton discovery and FICP for causal direction—to identify causal structure across clients using only summary statistics. Key contributions include the design of FCIT and FICP, their implementation via summary statistics and random features, and empirical validation on synthetic linear Gaussian and general functional models as well as real data from fMRI and stock markets, where FedCDH outperforms baseline methods. The approach enables scalable, privacy-conscious causal discovery in domains like healthcare and finance, with potential extensions to vertically partitioned data and more efficient privacy-preserving computations.

Abstract

Conventional causal discovery methods rely on centralized data, which is inconsistent with the decentralized nature of data in many real-world situations. This discrepancy has motivated the development of federated causal discovery (FCD) approaches. However, existing FCD methods may be limited by their potentially restrictive assumptions of identifiable functional causal models or homogeneous data distributions, narrowing their applicability in diverse scenarios. In this paper, we propose a novel FCD method attempting to accommodate arbitrary causal models and heterogeneous data. We first utilize a surrogate variable corresponding to the client index to account for the data heterogeneity across different clients. We then develop a federated conditional independence test (FCIT) for causal skeleton discovery and establish a federated independent change principle (FICP) to determine causal directions. These approaches involve constructing summary statistics as a proxy of the raw data to protect data privacy. Owing to the nonparametric properties, FCIT and FICP make no assumption about particular functional forms, thereby facilitating the handling of arbitrary causal models. We conduct extensive experiments on synthetic and real datasets to show the efficacy of our method. The code is available at https://github.com/lokali/FedCDH.git.

Federated Causal Discovery from Heterogeneous Data

TL;DR

Abstract

Paper Structure (40 sections, 17 theorems, 56 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 40 sections, 17 theorems, 56 equations, 11 figures, 5 tables, 1 algorithm.

Introduction
Revisiting Causal Discovery from Heterogeneous Data
Federated Causal Discovery from Heterogeneous Data
Federated Conditional Independent Test (FCIT)
Federated Independent Change Principle (FICP)
Implementing FCIT and FICP with Summary Statistics
Communication Costs and Secure Computations
Experiments
Discussion and Conclusion
Summary of Symbols
Related Works
Details about the Characterization
Characterization of Conditional Independence
Characterization of Independent Change
Proofs
...and 25 more sections

Key Result

Lemma 1

Let $\ddot{X} \triangleq (X,Z), k_{\mathcal{\ddot{X}}}\triangleq k_{\mathcal{X}} k_{\mathcal{Z}}$, and $\mathcal{H_{\ddot{X}}}$ be the RKHS corresponding to $k_{\mathcal{\ddot{X}}}$. Assume that $\mathcal{H_X} \subset L^2_X, \mathcal{H_Y} \subset L^2_Y, \mathcal{H_Z} \subset L^2_Z$. Further assume t

Figures (11)

Figure 1: An illustration where the causal models of variables $V_i$ and $V_j$ are changing across domains. (a) the graph with unobserved domain-changing factors $\psi_{\ell}(\mho)$, $\theta_i(\mho)$ and $\theta_j(\mho)$; (b) the simplified graph with the surrogate variable $\mho$.
Figure 2: Overall framework of $\operatorname{FedCDH}$. Left: The clients will send their sample sizes and local covariance tensors to the server, for constructing the summary statistics. The federated causal discovery will be implemented on the server. Right Top: Relying on the summary statistics, we propose two submodules: federated conditional independence test and federated independent change principle, for skeleton discovery and direction determination. Right Bottom: An example of FCD with three observed variables is illustrated, where the causal modules related to $V_2$ and $V_3$ are changing.
Figure 3: Results of synthetic dataset on linear Gaussian model. By rows, we evaluate varying number of variables $d$, varying number of clients $K$, and varying number of samples $n_k$. By columns, we evaluate Skeleton $F_1$ ($\uparrow$), Skeleton SHD ($\downarrow$), Direction $F_1$ ($\uparrow$) and Direction SHD ($\downarrow$).
Figure A1: Given that $X \perp\!\!\!\perp Y | Z$, we could introduce the independence between $R_{\ddot{X}|Z}$ and $R_{Y|Z}$.
Figure A2: Results of the synthetic dataset on (a) linear Gaussian model and (b) general functional model. By rows in each subfigure, we evaluate varying number of variables $d$, varying number of clients $K$, and varying number of samples $n_k$. By columns in each subfigure, we evaluate Skeleton Precision ($\uparrow$), Skeleton Recall ($\uparrow$), Direction Precision ($\uparrow$) and Direction Recall ($\uparrow$).
...and 6 more figures

Theorems & Definitions (18)

Lemma 1: Characterization of CI with Partial Cross-covariance fukumizu2007kernel
Lemma 2: Independent Change Principle huang2020causal
Lemma 3: Characterization of Conditional Independence
Theorem 4: Federated Conditional Independent Test
Theorem 5: Null Distribution Approximation
Theorem 6: Federated Independent Change Principle
Lemma 7: Estimating Covariance Matrix from Kernel Matrix
Theorem 8: Sufficiency of Summary Statistics
Lemma 9: Characteristic Kernel fukumizu2007kernel
Lemma 10: Characterization of CI based on Partial Association daudin1980partial
...and 8 more

Federated Causal Discovery from Heterogeneous Data

TL;DR

Abstract

Federated Causal Discovery from Heterogeneous Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (18)