Table of Contents
Fetching ...

On Statistical Inference for High-Dimensional Binary Time Series

Dehao Dai, Yunyi Zhang

TL;DR

This work tackles inference for high-dimensional binary time series by adopting a generalized binary VAR (gbVAR) framework with sparse coefficient matrices. It introduces a post-selection estimator that yields tractable, close-form estimates on selected supports and proves model-selection consistency, along with a Gaussian approximation for the estimator. To enable valid uncertainty quantification, the authors develop a second-order wild bootstrap that accounts for complex temporal dependence and random coefficients. Through Bernstein-type concentration results and extensive simulations plus real-data applications (portfolio management and global trade), the method demonstrates accurate inference and interpretable, sparsity-driven dynamic networks. Overall, the paper provides a coherent theoretical and empirical toolkit for statistical inference in high-dimensional binary time-series settings with practical relevance to networks and economics.

Abstract

The analysis of non-real-valued data, such as binary time series, has attracted great interest in recent years. This manuscript proposes a post-selection estimator for estimating the coefficient matrices of a high-dimensional generalized binary vector autoregressive process and establishes a Gaussian approximation theorem for the proposed estimator. Furthermore, it introduces a second-order wild bootstrap algorithm to enable statistical inference on the coefficient matrices. Numerical studies and empirical applications demonstrate the good finite-sample performance of the proposed method.

On Statistical Inference for High-Dimensional Binary Time Series

TL;DR

This work tackles inference for high-dimensional binary time series by adopting a generalized binary VAR (gbVAR) framework with sparse coefficient matrices. It introduces a post-selection estimator that yields tractable, close-form estimates on selected supports and proves model-selection consistency, along with a Gaussian approximation for the estimator. To enable valid uncertainty quantification, the authors develop a second-order wild bootstrap that accounts for complex temporal dependence and random coefficients. Through Bernstein-type concentration results and extensive simulations plus real-data applications (portfolio management and global trade), the method demonstrates accurate inference and interpretable, sparsity-driven dynamic networks. Overall, the paper provides a coherent theoretical and empirical toolkit for statistical inference in high-dimensional binary time-series settings with practical relevance to networks and economics.

Abstract

The analysis of non-real-valued data, such as binary time series, has attracted great interest in recent years. This manuscript proposes a post-selection estimator for estimating the coefficient matrices of a high-dimensional generalized binary vector autoregressive process and establishes a Gaussian approximation theorem for the proposed estimator. Furthermore, it introduces a second-order wild bootstrap algorithm to enable statistical inference on the coefficient matrices. Numerical studies and empirical applications demonstrate the good finite-sample performance of the proposed method.

Paper Structure

This paper contains 31 sections, 17 theorems, 178 equations, 6 figures, 3 tables.

Key Result

Proposition 3

Let $(X_t)_{t\in \mathbb{Z}}$ be a $d$-dimensional gbVAR$(1)$ process satisfying eq: gbvarp for all $t\in \mathbb{Z}$ and suppose condition eq: condition stationary holds true. Then the gbVAR(1) model has a gbVMA$(\infty)$-type representation where The convergence result holds true under $L_1$ norm.

Figures (6)

  • Figure 1: Realization of gbVAR(1) process $(X_t)_{t=1, \ldots, 5}$ with 10 nodes: the size of a node is proportional to its degree after discretization.
  • Figure 2: Figure \ref{['figure.heat_1']}, \ref{['figure.heat_2']}, and \ref{['figure.heat_3']} respectively depict the positions of the actual non-zero parameters and the corresponding estimated positions based on the row-wise algorithm and the post-selection estimator. In Figure \ref{['figure.heat_1']}, $\mathcal{A}^{(1)}$, $\mathcal{A}^{(2)}$ and $\mathcal{A}^{(3)}$ represent the real parameter matrix in DGP1, DGP2, and DGP3, respectively. These observations are generated using $n = 1500$, $d = 80$.
  • Figure 3: The performance of Lasso and post-selection algorithm concerning different choices of hyperparameters. Figure \ref{['fig: performance']}(a), (b), and (c) represent the row-wise two norms of estimation error with different $\ln$-scaled tuning parameters $\lambda$ and threshold $b_d$ in DGP1, DGP2, and DGP3 under $n = 1500$, $d = 80$. In the case of the post-selection algorithm, the left figure plots their performance with $b_d$ as the optimal parameter, while the right figure plots the performance with $\lambda$ as the optimal parameter.
  • Figure 4: Weekday time series $X_{t,k}$, $k = 1, \ldots, 7$ indicate Advance (black dots) and Decline (no dots) of the closing price between the day $t$ and $t-1$ at seven stocks in NASDAQ: Apple (AAPL), NVIDIA (NVDA), Facebook (FB, currently META), Tesla (TSLA), Microsoft (MSFT), Amazon.com (AMZN), Alphabet (GOOGL) from September 10, 2020 to September 10, 2021.
  • Figure 5: Heatmaps of estimated coefficient matrices for fitted gbVAR(1) processes on the seven stocks data sample based on (a) OLS $\widehat{\mathcal{A}}^\mathtt{OLS}$ (b) Post-selection algorithm $\widehat{\mathcal{A}}^\mathtt{Post}$. Purple boxes indicate statistically significant coefficients, i.e., that -0.08 and -0.05 are not contained within their confidence intervals.
  • ...and 1 more figures

Theorems & Definitions (33)

  • Definition 1: gbVAR($p$) processes
  • Remark 2
  • Example 1: 3-variate gbVAR(1) model
  • Example 2: gbVAR(1) random graph
  • Proposition 3: Moving-average representation of gbVAR(1) process
  • Remark 4
  • Proposition 5: Yule-Walker equations for gbVAR($1$) models
  • Remark 6
  • Definition 7: Partial inverse operator $\mathcal{F}.(\cdot)$
  • Remark 8
  • ...and 23 more