Staleness Factors and Volatility Estimation at High Frequencies
Xin-Bing Kong, Bin Wu, Wuyi Ye
TL;DR
This paper develops a nonlinear, high-dimensional price staleness factor model (SFM) where staleness probabilities satisfy $p_{it}=\Psi(z_{it})$ with $z_{it}=a_i'x_{it}+\gamma_i'g_t$, estimated via maximum likelihood in an infill asymptotic regime with $d,n\to\infty$. It shows that staleness biases downward the efficient price co-volatilities estimated by LPCA and provides bias-corrected estimators for both spot and integrated volatilities, with integrated estimates achieving $n^{-1/2}$ and spot estimates at slower rates; the estimators are robust to data staleness. The paper further demonstrates, through simulations and an empirical application to SP500 data, that incorporating staleness improves cross-sectional risk pricing and reduces out-of-sample portfolio risk, while offering a practical bias-correction approach via inverse staleness weighting. Overall, it delivers a comprehensive inference framework for price staleness in large panels, quantifies its impact on volatility estimation, and demonstrates substantial practical gains in asset pricing and portfolio construction.
Abstract
In this paper, we propose a price staleness factor model that accounts for pervasive market friction across assets and incorporates relevant covariates. Using large-panel high-frequency data, we derive the maximum likelihood estimators of the regression coefficients, the nonstationary factors, and their loading parameters. These estimators recover the time-varying price staleness probabilities. We develop asymptotic theory in which both the dimension $d$ and the sampling frequency $n$ tend to infinity. Using a local principal component analysis (LPCA) approach, we find that the efficient price co-volatilities (systematic and idiosyncratic) are biased downward due to the presence of staleness. We provide bias-corrected estimators for both the spot and integrated systematic and idiosyncratic co-volatilities, and prove that these estimators are robust to data staleness. Interestingly, besides their dependence on the dimensionality $d$, the integrated plug-in estimates converge at a rate of $n^{-1/2}$ without requiring correcting term, whereas the local PCA estimates converge at a slower rate of $n^{-1/4}$. This validates the aggregation efficiency achieved through nonlinear, nonstationary factor analysis via maximum likelihood estimation. Numerical experiments justify our theoretical findings. Empirically, we demonstrate that the staleness factor provides unique explanatory power for cross-sectional risk premia, and that the staleness correction reduces out-of-sample portfolio risk.
