Personalized Binomial DAGs Learning with Network Structured Covariates
Boxin Zhao, Weishi Wang, Dingyuan Zhu, Ziqi Liu, Dong Wang, Zhiqiang Zhang, Jun Zhou, Mladen Kolar
TL;DR
This work develops causal discovery for multivariate count data under user heterogeneity and social network structure by introducing personalized Binomial DAG models. It proposes a four-step learning algorithm that (i) embeds network covariates into a low-dimensional representation, (ii) estimates node neighborhoods via penalized kernel smoothing, (iii) determines a DAG order using an overdispersion score, and (iv) recovers the DAG with a penalized neighborhood procedure. The approach supports both linear and nonlinear embeddings (including Graph Auto-Encoders) and demonstrates superior performance to a state-of-the-art method that ignores heterogeneity, in both synthetic and real web-visit data from Alipay during the COVID-19 era. The results yield interpretable directional relationships that align with practical operation strategies, such as linking city services to transport payments and green health codes. Overall, the paper advances causal discovery for dependent count data by jointly modeling DAG structure, observation heterogeneity, and network structure, with clear methodological and applied benefits.
Abstract
The causal dependence in data is often characterized by Directed Acyclic Graphical (DAG) models, widely used in many areas. Causal discovery aims to recover the DAG structure using observational data. This paper focuses on causal discovery with multi-variate count data. We are motivated by real-world web visit data, recording individual user visits to multiple websites. Building a causal diagram can help understand user behavior in transitioning between websites, inspiring operational strategy. A challenge in modeling is user heterogeneity, as users with different backgrounds exhibit varied behaviors. Additionally, social network connections can result in similar behaviors among friends. We introduce personalized Binomial DAG models to address heterogeneity and network dependency between observations, which are common in real-world applications. To learn the proposed DAG model, we develop an algorithm that embeds the network structure into a dimension-reduced covariate, learns each node's neighborhood to reduce the DAG search space, and explores the variance-mean relation to determine the ordering. Simulations show our algorithm outperforms state-of-the-art competitors in heterogeneous data. We demonstrate its practical usefulness on a real-world web visit dataset.
