Table of Contents
Fetching ...

Semiparametric Modeling and Analysis for Longitudinal Network Data

Yinqiu He, Jiajin Sun, Yuang Tian, Zhiliang Ying, Yang Feng

TL;DR

This work addresses the challenge of learning a shared, static latent space from longitudinal network count data by proposing a semiparametric Poisson latent-space model with a time-invariant latent matrix $Z$ and time-varying baseline $\alpha_{it}$. It develops two estimation strategies: a generalized semiparametric one-step updater based on the efficient score on the quotient manifold and a nuclear-norm penalized maximum likelihood estimator on $G=ZZ^{\top}$, each achieving near-oracle error rates for the latent structure. Theoretical results establish non-Euclidean convergence rates and account for identifiability under rotation, while practical validation includes simulation studies and analysis of the New York Citi Bike dataset, where latent positions align with geography and reveal meaningful baseline activity patterns. Collectively, the paper provides a principled, efficient framework for inferring latent structure in longitudinal networks with node-time heterogeneity, with implications for prediction, hypothesis testing, and change-point analysis in complex networks.

Abstract

We introduce a semiparametric latent space model for analyzing longitudinal network data. The model consists of a static latent space component and a time-varying node-specific baseline component. We develop a semiparametric efficient score equation for the latent space parameter by adjusting for the baseline nuisance component. Estimation is accomplished through a one-step update estimator and an appropriately penalized maximum likelihood estimator. We derive oracle error bounds for the two estimators and address identifiability concerns from a quotient manifold perspective. Our approach is demonstrated using the New York Citi Bike Dataset.

Semiparametric Modeling and Analysis for Longitudinal Network Data

TL;DR

This work addresses the challenge of learning a shared, static latent space from longitudinal network count data by proposing a semiparametric Poisson latent-space model with a time-invariant latent matrix and time-varying baseline . It develops two estimation strategies: a generalized semiparametric one-step updater based on the efficient score on the quotient manifold and a nuclear-norm penalized maximum likelihood estimator on , each achieving near-oracle error rates for the latent structure. Theoretical results establish non-Euclidean convergence rates and account for identifiability under rotation, while practical validation includes simulation studies and analysis of the New York Citi Bike dataset, where latent positions align with geography and reveal meaningful baseline activity patterns. Collectively, the paper provides a principled, efficient framework for inferring latent structure in longitudinal networks with node-time heterogeneity, with implications for prediction, hypothesis testing, and change-point analysis in complex networks.

Abstract

We introduce a semiparametric latent space model for analyzing longitudinal network data. The model consists of a static latent space component and a time-varying node-specific baseline component. We develop a semiparametric efficient score equation for the latent space parameter by adjusting for the baseline nuisance component. Estimation is accomplished through a one-step update estimator and an appropriately penalized maximum likelihood estimator. We derive oracle error bounds for the two estimators and address identifiability concerns from a quotient manifold perspective. Our approach is demonstrated using the New York Citi Bike Dataset.
Paper Structure (100 sections, 42 theorems, 573 equations, 19 figures, 3 tables, 1 algorithm)

This paper contains 100 sections, 42 theorems, 573 equations, 19 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Assume Conditions cond:truvalueregularity--cond_elem_init. Let $\hat{Z}$ be the generalized one-step estimator defined as in eq:newtonsolpseudo. Let $\varsigma = \max\{\epsilon,1/2\}$. For any constant $s>0$, there exists a constant $C_s > 0$ such that when $n/ \log^{2\varsigma}(T)$ is sufficiently where $r_{n,T}=\max\left\{1,\, \frac{T}{n}\right\} \log^{4\varsigma}(nT)$.

Figures (19)

  • Figure 1: Each circle represents one equivalence class of $(z_1,z_2)$ giving the same value of $\bar{f}(z_1,z_2)$. It suffices to search for maximization along one given direction.
  • Figure 2: Case (I): Empirical estimation errors of the one-step estimator. Panel (a) presents $\operatorname{dist}^2(\hat{Z}, Z^\star)$ (averaged over 50 repetitions) versus $T$ in the scenario (a). Panel (b) presents $\operatorname{dist}^2(\hat{Z}, Z^\star)$ (averaged over 50 repetitions) versus $n$ in the scenario (b). In (a) and (b), axes are in the log scale, three lines correspond to results under $k\in \{2,4,8\}$, respectively, and error bars are obtained by $\pm$ the standard deviation from 50 repetitions. Panel (c) presents the slopes from regressing $\log \operatorname{dist}^2(\hat{Z}, Z^\star)$ on $\log T$ with fixed $(n,k)\in \{200\}\times \{2,4,8\}$ in the 50 repetitions under the scenario (a).
  • Figure 3: Case (I): Empirical estimation errors of the penalized MLE. Panels (a)--(c) are presented similarly to Figure \ref{['fig:resultscasea1']}.
  • Figure 4: Case (II): Empirical estimation errors of the one-step estimator. Panels (a)--(c) are presented similarly to Figure \ref{['fig:resultscasea1']}.
  • Figure 5: Case (II): Empirical estimation errors of the penalized MLE. Panels (a)--(c) are presented similarly to Figure \ref{['fig:resultscasea1']}.
  • ...and 14 more figures

Theorems & Definitions (93)

  • Remark 1
  • Theorem 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Lemma 2: lee2013smooth, Chapter 21
  • Proposition 3
  • Remark 5
  • Theorem 4
  • Theorem 5
  • ...and 83 more