Dynamic Factor Analysis of High-dimensional Recurrent Events

Fangyi Chen; Yunxiao Chen; Zhiliang Ying; Kangjie Zhou

Dynamic Factor Analysis of High-dimensional Recurrent Events

Fangyi Chen, Yunxiao Chen, Zhiliang Ying, Kangjie Zhou

TL;DR

The paper tackles high-dimensional recurrent event data by introducing a semiparametric dynamic factor model where the mean rate satisfies $E(dY_{ij}(t)) = f(X_{ij}(t))\,dt$ with $X_{ij}(t) = \sum_{k=1}^r a_{jk}\theta_{ik}(t)$ and $\,\mathbf X(t)=\boldsymbol\Theta(t)\mathbf A^\top$. Estimation is achieved via a kernel-smoothed pseudo-likelihood under low-rank structure, coupled with a two-step discretization strategy and a projected gradient descent algorithm. An information criterion $\text{IC}(r) = -2\mathcal L_h(\hat{\boldsymbol\Theta}^{(r)},\hat{\mathbf A}^{(r)}) + v(N,J,r)$ consistently selects the number of factors, with theoretical results showing near rate-optimal convergence for the estimator and model-selection consistency under both dependent and independent block structures. Simulations demonstrate favorable performance relative to Poisson-factor models when latent dynamics are time-varying, and an application to grocery shopping data yields interpretable factors (healthy, unhealthy, and basic consumption) and insights into purchase dynamics. Overall, the framework provides a principled, scalable approach to uncover latent dynamic structure in complex multivariate event-time data with practical utility in market analytics and related fields.

Abstract

Recurrent event time data arise in many studies, including biomedicine, public health, marketing, and social media analysis. High-dimensional recurrent event data involving many event types and observations have become prevalent with advances in information technology. This paper proposes a semiparametric dynamic factor model for the dimension reduction of high-dimensional recurrent event data. The proposed model imposes a low-dimensional structure on the mean intensity functions of the event types while allowing for dependencies. A nearly rate-optimal smoothing-based estimator is proposed. An information criterion that consistently selects the number of factors is also developed. Simulation studies demonstrate the effectiveness of these inference tools. The proposed method is applied to grocery shopping data, for which an interpretable factor structure is obtained.

Dynamic Factor Analysis of High-dimensional Recurrent Events

TL;DR

The paper tackles high-dimensional recurrent event data by introducing a semiparametric dynamic factor model where the mean rate satisfies

with

and

. Estimation is achieved via a kernel-smoothed pseudo-likelihood under low-rank structure, coupled with a two-step discretization strategy and a projected gradient descent algorithm. An information criterion

consistently selects the number of factors, with theoretical results showing near rate-optimal convergence for the estimator and model-selection consistency under both dependent and independent block structures. Simulations demonstrate favorable performance relative to Poisson-factor models when latent dynamics are time-varying, and an application to grocery shopping data yields interpretable factors (healthy, unhealthy, and basic consumption) and insights into purchase dynamics. Overall, the framework provides a principled, scalable approach to uncover latent dynamic structure in complex multivariate event-time data with practical utility in market analytics and related fields.

Abstract

Paper Structure (26 sections, 8 theorems, 149 equations, 1 figure, 16 tables)

This paper contains 26 sections, 8 theorems, 149 equations, 1 figure, 16 tables.

Introduction
Proposed Method
Model
Estimation
Determining the Number of Factors
Theoretical properties
Consistency and Rate of Convergence
Model Selection Consistency
Simulation Study
Application to Grocery Shopping Data
Background, Data Processing, and Analysis
Interpreting Factors
Investigating Purchase Dynamics
Discussions
Proof of Theorems and Lemmas
...and 11 more sections

Key Result

Theorem 1

Under Conditions cond:1-cond:4:

Figures (1)

Figure 1: Quartiles of the variability of the most frequently purchased product types.

Theorems & Definitions (35)

Remark 1: Link function
Remark 2: Intensity formulation
Remark 3: Connection with factor models
Remark 4: Indeterminacy of $\boldsymbol\Theta(\cdot)$ and $\mathbf A$ and a rotated solution
Remark 5: Time-varying loadings
Remark 6
Remark 7
Remark 8
Remark 9
Remark 10
...and 25 more

Dynamic Factor Analysis of High-dimensional Recurrent Events

TL;DR

Abstract

Dynamic Factor Analysis of High-dimensional Recurrent Events

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (35)