Dynamic Factor Analysis of High-dimensional Recurrent Events
Fangyi Chen, Yunxiao Chen, Zhiliang Ying, Kangjie Zhou
TL;DR
The paper tackles high-dimensional recurrent event data by introducing a semiparametric dynamic factor model where the mean rate satisfies $E(dY_{ij}(t)) = f(X_{ij}(t))\,dt$ with $X_{ij}(t) = \sum_{k=1}^r a_{jk}\theta_{ik}(t)$ and $\,\mathbf X(t)=\boldsymbol\Theta(t)\mathbf A^\top$. Estimation is achieved via a kernel-smoothed pseudo-likelihood under low-rank structure, coupled with a two-step discretization strategy and a projected gradient descent algorithm. An information criterion $\text{IC}(r) = -2\mathcal L_h(\hat{\boldsymbol\Theta}^{(r)},\hat{\mathbf A}^{(r)}) + v(N,J,r)$ consistently selects the number of factors, with theoretical results showing near rate-optimal convergence for the estimator and model-selection consistency under both dependent and independent block structures. Simulations demonstrate favorable performance relative to Poisson-factor models when latent dynamics are time-varying, and an application to grocery shopping data yields interpretable factors (healthy, unhealthy, and basic consumption) and insights into purchase dynamics. Overall, the framework provides a principled, scalable approach to uncover latent dynamic structure in complex multivariate event-time data with practical utility in market analytics and related fields.
Abstract
Recurrent event time data arise in many studies, including biomedicine, public health, marketing, and social media analysis. High-dimensional recurrent event data involving many event types and observations have become prevalent with advances in information technology. This paper proposes a semiparametric dynamic factor model for the dimension reduction of high-dimensional recurrent event data. The proposed model imposes a low-dimensional structure on the mean intensity functions of the event types while allowing for dependencies. A nearly rate-optimal smoothing-based estimator is proposed. An information criterion that consistently selects the number of factors is also developed. Simulation studies demonstrate the effectiveness of these inference tools. The proposed method is applied to grocery shopping data, for which an interpretable factor structure is obtained.
