Table of Contents
Fetching ...

Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting

Serina Chang, Frederic Koehler, Zhaonan Qu, Jure Leskovec, Johan Ugander

TL;DR

The paper tackles inferring dynamic networks from time-varying marginals and a time-aggregated network by introducing a biproportional Poisson generative model and proving that IPF recovers its maximum likelihood estimates. It establishes structure-dependent MLE error bounds and a high-probability well-posedness result, connecting network connectivity to estimation accuracy. To address non-convergence on sparse data, the authors propose ConvIPF, a principled convex-procedure-inspired algorithm that minimally augments the network to guarantee convergence. Empirical validation on synthetic data, mobility data, and CitiBike ground-truth networks demonstrates that IPF, aided by ConvIPF and the proposed model, can accurately infer hourly networks and improve over standard baselines in practical mobility settings.

Abstract

A common network inference problem, arising from real-world data constraints, is how to infer a dynamic network from its time-aggregated adjacency matrix and time-varying marginals (i.e., row and column sums). Prior approaches to this problem have repurposed the classic iterative proportional fitting (IPF) procedure, also known as Sinkhorn's algorithm, with promising empirical results. However, the statistical foundation for using IPF has not been well understood: under what settings does IPF provide principled estimation of a dynamic network from its marginals, and how well does it estimate the network? In this work, we establish such a setting, by identifying a generative network model whose maximum likelihood estimates are recovered by IPF. Our model both reveals implicit assumptions on the use of IPF in such settings and enables new analyses, such as structure-dependent error bounds on IPF's parameter estimates. When IPF fails to converge on sparse network data, we introduce a principled algorithm that guarantees IPF converges under minimal changes to the network structure. Finally, we conduct experiments with synthetic and real-world data, which demonstrate the practical value of our theoretical and algorithmic contributions.

Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting

TL;DR

The paper tackles inferring dynamic networks from time-varying marginals and a time-aggregated network by introducing a biproportional Poisson generative model and proving that IPF recovers its maximum likelihood estimates. It establishes structure-dependent MLE error bounds and a high-probability well-posedness result, connecting network connectivity to estimation accuracy. To address non-convergence on sparse data, the authors propose ConvIPF, a principled convex-procedure-inspired algorithm that minimally augments the network to guarantee convergence. Empirical validation on synthetic data, mobility data, and CitiBike ground-truth networks demonstrates that IPF, aided by ConvIPF and the proposed model, can accurately infer hourly networks and improve over standard baselines in practical mobility settings.

Abstract

A common network inference problem, arising from real-world data constraints, is how to infer a dynamic network from its time-aggregated adjacency matrix and time-varying marginals (i.e., row and column sums). Prior approaches to this problem have repurposed the classic iterative proportional fitting (IPF) procedure, also known as Sinkhorn's algorithm, with promising empirical results. However, the statistical foundation for using IPF has not been well understood: under what settings does IPF provide principled estimation of a dynamic network from its marginals, and how well does it estimate the network? In this work, we establish such a setting, by identifying a generative network model whose maximum likelihood estimates are recovered by IPF. Our model both reveals implicit assumptions on the use of IPF in such settings and enables new analyses, such as structure-dependent error bounds on IPF's parameter estimates. When IPF fails to converge on sparse network data, we introduce a principled algorithm that guarantees IPF converges under minimal changes to the network structure. Finally, we conduct experiments with synthetic and real-world data, which demonstrate the practical value of our theoretical and algorithmic contributions.
Paper Structure (77 sections, 14 theorems, 131 equations, 14 figures, 4 tables, 1 algorithm)

This paper contains 77 sections, 14 theorems, 131 equations, 14 figures, 4 tables, 1 algorithm.

Key Result

Theorem 3.1

Assume that the matrix balancing problem with $\bar{X}$, $p^{(t)}$, and $q^{(t)}$ has a finite solution $(D^0,D^1)$. Then $d^0$ and $d^1$ are limits of the IPF iterations if and only if $\hat{u} = \log d^0$ and $\hat{v} = -\log d^1$ are solutions to the maximum likelihood estimation problem of eqn:m Moreover, maximizing $\ell(u,v)$ is equivalent to the maximum likelihood estimation of a Poisson re

Figures (14)

  • Figure 1: Comparing inferred parameters from IPF (x-axis) against inferred parameters from Poisson regression (left y-axis, blue) and true parameters from Poisson model (right y-axis, orange). Grey bars indicate 95% CIs from Poisson regression. $N$ represents the number of nonzero entries in $X$, so $N$ is halved with 50% sparsity. Under both networks, estimated parameters from IPF and Poisson regression are perfectly aligned (Theorem \ref{['thm:model']}), but their estimation quality worsens with greater sparsity (Theorem \ref{['thm:mse']}).
  • Figure 2: Comparing sparsity rate in $\bar{X}$ to number of IPF iterations (left), bound on MLE's expected estimation error, without constants (middle), and observed $\ell_2$ error of IPF estimates (right). Lines represent mean and shaded region represents 95% CIs over 1000 trials.
  • Figure 3: Cosine similarity between ground-truth hourly networks from bikeshare data and inferred networks from IPF and baselines.
  • Figure 2.1: Summary of how the different conditions we discuss in this work fit together. Arrows indicate that one condition implies (i.e., is a sufficient condition for) another. Conditions II-V are all sufficient, but not necessary, conditions for IPF to converge. Prior work has also defined several necessary and sufficient conditions for IPF to converge pukelsheim2014; see Section \ref{['sec:convergence-algo']} for details.
  • Figure 4.2: Example of how we would modify $G_f$ to increase the overall flow by $\epsilon$, given nodes $n_i$ and $n_j$ (in green) that have not reached capacity.
  • ...and 9 more figures

Theorems & Definitions (28)

  • Theorem 3.1
  • Theorem 4.1
  • Theorem 4.2
  • proof
  • Theorem 2.1
  • proof
  • Theorem 2.2
  • proof
  • Lemma 3.1: Appendix F.3 of qu2023sinkhorn
  • Theorem 3.2
  • ...and 18 more