Table of Contents
Fetching ...

Staleness Factors and Volatility Estimation at High Frequencies

Xin-Bing Kong, Bin Wu, Wuyi Ye

TL;DR

This paper develops a nonlinear, high-dimensional price staleness factor model (SFM) where staleness probabilities satisfy $p_{it}=\Psi(z_{it})$ with $z_{it}=a_i'x_{it}+\gamma_i'g_t$, estimated via maximum likelihood in an infill asymptotic regime with $d,n\to\infty$. It shows that staleness biases downward the efficient price co-volatilities estimated by LPCA and provides bias-corrected estimators for both spot and integrated volatilities, with integrated estimates achieving $n^{-1/2}$ and spot estimates at slower rates; the estimators are robust to data staleness. The paper further demonstrates, through simulations and an empirical application to SP500 data, that incorporating staleness improves cross-sectional risk pricing and reduces out-of-sample portfolio risk, while offering a practical bias-correction approach via inverse staleness weighting. Overall, it delivers a comprehensive inference framework for price staleness in large panels, quantifies its impact on volatility estimation, and demonstrates substantial practical gains in asset pricing and portfolio construction.

Abstract

In this paper, we propose a price staleness factor model that accounts for pervasive market friction across assets and incorporates relevant covariates. Using large-panel high-frequency data, we derive the maximum likelihood estimators of the regression coefficients, the nonstationary factors, and their loading parameters. These estimators recover the time-varying price staleness probabilities. We develop asymptotic theory in which both the dimension $d$ and the sampling frequency $n$ tend to infinity. Using a local principal component analysis (LPCA) approach, we find that the efficient price co-volatilities (systematic and idiosyncratic) are biased downward due to the presence of staleness. We provide bias-corrected estimators for both the spot and integrated systematic and idiosyncratic co-volatilities, and prove that these estimators are robust to data staleness. Interestingly, besides their dependence on the dimensionality $d$, the integrated plug-in estimates converge at a rate of $n^{-1/2}$ without requiring correcting term, whereas the local PCA estimates converge at a slower rate of $n^{-1/4}$. This validates the aggregation efficiency achieved through nonlinear, nonstationary factor analysis via maximum likelihood estimation. Numerical experiments justify our theoretical findings. Empirically, we demonstrate that the staleness factor provides unique explanatory power for cross-sectional risk premia, and that the staleness correction reduces out-of-sample portfolio risk.

Staleness Factors and Volatility Estimation at High Frequencies

TL;DR

This paper develops a nonlinear, high-dimensional price staleness factor model (SFM) where staleness probabilities satisfy with , estimated via maximum likelihood in an infill asymptotic regime with . It shows that staleness biases downward the efficient price co-volatilities estimated by LPCA and provides bias-corrected estimators for both spot and integrated volatilities, with integrated estimates achieving and spot estimates at slower rates; the estimators are robust to data staleness. The paper further demonstrates, through simulations and an empirical application to SP500 data, that incorporating staleness improves cross-sectional risk pricing and reduces out-of-sample portfolio risk, while offering a practical bias-correction approach via inverse staleness weighting. Overall, it delivers a comprehensive inference framework for price staleness in large panels, quantifies its impact on volatility estimation, and demonstrates substantial practical gains in asset pricing and portfolio construction.

Abstract

In this paper, we propose a price staleness factor model that accounts for pervasive market friction across assets and incorporates relevant covariates. Using large-panel high-frequency data, we derive the maximum likelihood estimators of the regression coefficients, the nonstationary factors, and their loading parameters. These estimators recover the time-varying price staleness probabilities. We develop asymptotic theory in which both the dimension and the sampling frequency tend to infinity. Using a local principal component analysis (LPCA) approach, we find that the efficient price co-volatilities (systematic and idiosyncratic) are biased downward due to the presence of staleness. We provide bias-corrected estimators for both the spot and integrated systematic and idiosyncratic co-volatilities, and prove that these estimators are robust to data staleness. Interestingly, besides their dependence on the dimensionality , the integrated plug-in estimates converge at a rate of without requiring correcting term, whereas the local PCA estimates converge at a slower rate of . This validates the aggregation efficiency achieved through nonlinear, nonstationary factor analysis via maximum likelihood estimation. Numerical experiments justify our theoretical findings. Empirically, we demonstrate that the staleness factor provides unique explanatory power for cross-sectional risk premia, and that the staleness correction reduces out-of-sample portfolio risk.

Paper Structure

This paper contains 17 sections, 10 theorems, 43 equations, 4 figures, 1 table.

Key Result

Proposition 1

If Assumptions assump:staleness factor model and assump:staleness factor model2 hold, and if there exists a constant $\delta^{\dag}>0$ such that $\frac{d}{n^{1+\delta^{\dag}}}=o(1)$.

Figures (4)

  • Figure 1: Average daily staleness factors. Notes. This graph illustrates three estimated staleness factors (daily average) for 2014, derived from 5-minute sampling intervals.
  • Figure 2: Generalized correlations between staleness factors with other factors. Notes. The figure displays the generalized correlations of the first three staleness factors with: 1) Left panel: the four high-frequency continuous factors; 2) Middle panel: the Fama-French-Carhart factors; 3) Right panel: the full stock-panel data. Each correlation is computed using factor estimates from a rolling one-month window throughout 2014.
  • Figure 3: Time-varying explained variation by factor. Notes. This figure shows the percentage of continuous variation explained---computed using pelger2019large's method---over a rolling one-month window (21 trading days).
  • Figure 4: Out-of-sample portfolio risk (left panel: 5 minute; right panel: 1 minute). Notes. This figure compares the out-of-sample annualized volatility (for May 2014) of S&P 500 index constituents from April 2014. The x-axis represents the exposure constraint $c$ in the optimization problem \ref{['eq:portfolio allocation']}. Four volatility matrix estimators are compared: uncorrected spot volatility (Uncorrected SV), uncorrected integrated volatility (Uncorrected IV), corrected (logit type) spot volatility (Corrected SV), and corrected integrated volatility (Corrected IV). "Equal weight" refers to an equally weighted portfolio.

Theorems & Definitions (11)

  • Proposition 1
  • Proposition 2
  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Corollary 2
  • Remark 1
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • ...and 1 more