Table of Contents
Fetching ...

Personalized Binomial DAGs Learning with Network Structured Covariates

Boxin Zhao, Weishi Wang, Dingyuan Zhu, Ziqi Liu, Dong Wang, Zhiqiang Zhang, Jun Zhou, Mladen Kolar

TL;DR

This work develops causal discovery for multivariate count data under user heterogeneity and social network structure by introducing personalized Binomial DAG models. It proposes a four-step learning algorithm that (i) embeds network covariates into a low-dimensional representation, (ii) estimates node neighborhoods via penalized kernel smoothing, (iii) determines a DAG order using an overdispersion score, and (iv) recovers the DAG with a penalized neighborhood procedure. The approach supports both linear and nonlinear embeddings (including Graph Auto-Encoders) and demonstrates superior performance to a state-of-the-art method that ignores heterogeneity, in both synthetic and real web-visit data from Alipay during the COVID-19 era. The results yield interpretable directional relationships that align with practical operation strategies, such as linking city services to transport payments and green health codes. Overall, the paper advances causal discovery for dependent count data by jointly modeling DAG structure, observation heterogeneity, and network structure, with clear methodological and applied benefits.

Abstract

The causal dependence in data is often characterized by Directed Acyclic Graphical (DAG) models, widely used in many areas. Causal discovery aims to recover the DAG structure using observational data. This paper focuses on causal discovery with multi-variate count data. We are motivated by real-world web visit data, recording individual user visits to multiple websites. Building a causal diagram can help understand user behavior in transitioning between websites, inspiring operational strategy. A challenge in modeling is user heterogeneity, as users with different backgrounds exhibit varied behaviors. Additionally, social network connections can result in similar behaviors among friends. We introduce personalized Binomial DAG models to address heterogeneity and network dependency between observations, which are common in real-world applications. To learn the proposed DAG model, we develop an algorithm that embeds the network structure into a dimension-reduced covariate, learns each node's neighborhood to reduce the DAG search space, and explores the variance-mean relation to determine the ordering. Simulations show our algorithm outperforms state-of-the-art competitors in heterogeneous data. We demonstrate its practical usefulness on a real-world web visit dataset.

Personalized Binomial DAGs Learning with Network Structured Covariates

TL;DR

This work develops causal discovery for multivariate count data under user heterogeneity and social network structure by introducing personalized Binomial DAG models. It proposes a four-step learning algorithm that (i) embeds network covariates into a low-dimensional representation, (ii) estimates node neighborhoods via penalized kernel smoothing, (iii) determines a DAG order using an overdispersion score, and (iv) recovers the DAG with a penalized neighborhood procedure. The approach supports both linear and nonlinear embeddings (including Graph Auto-Encoders) and demonstrates superior performance to a state-of-the-art method that ignores heterogeneity, in both synthetic and real web-visit data from Alipay during the COVID-19 era. The results yield interpretable directional relationships that align with practical operation strategies, such as linking city services to transport payments and green health codes. Overall, the paper advances causal discovery for dependent count data by jointly modeling DAG structure, observation heterogeneity, and network structure, with clear methodological and applied benefits.

Abstract

The causal dependence in data is often characterized by Directed Acyclic Graphical (DAG) models, widely used in many areas. Causal discovery aims to recover the DAG structure using observational data. This paper focuses on causal discovery with multi-variate count data. We are motivated by real-world web visit data, recording individual user visits to multiple websites. Building a causal diagram can help understand user behavior in transitioning between websites, inspiring operational strategy. A challenge in modeling is user heterogeneity, as users with different backgrounds exhibit varied behaviors. Additionally, social network connections can result in similar behaviors among friends. We introduce personalized Binomial DAG models to address heterogeneity and network dependency between observations, which are common in real-world applications. To learn the proposed DAG model, we develop an algorithm that embeds the network structure into a dimension-reduced covariate, learns each node's neighborhood to reduce the DAG search space, and explores the variance-mean relation to determine the ordering. Simulations show our algorithm outperforms state-of-the-art competitors in heterogeneous data. We demonstrate its practical usefulness on a real-world web visit dataset.
Paper Structure (17 sections, 33 equations, 3 figures, 1 table, 2 algorithms)

This paper contains 17 sections, 33 equations, 3 figures, 1 table, 2 algorithms.

Figures (3)

  • Figure 1: Comparison of our algorithm with QVF park2017learning on DAG learning accuracy under linear setup. Both algorithms were run on $10$ independent realizations for each combination of $d_X$ and $n$. The solid dot shows the mean, and the error bar shows one standard deviation across $10$ experiments.
  • Figure 2: Comparison of our algorithm with QVF park2017learning on DAG learning accuracy under nonlinear setup. Both algorithms were run on $10$ independent realizations for each combination of $d_X$ and $n$. The solid dot shows the mean, and the error bar shows one standard deviation across $10$ experiments.
  • Figure 3: DAG estimation result of the $17$ websites of Alipay.