Table of Contents
Fetching ...

A spectral clustering-type algorithm for the consistent estimation of the Hurst distribution in moderately high dimensions

Patrice Abry, Gustavo Didier, Oliver Orejola, Herwig Wendt

TL;DR

The paper tackles the challenge of identifying multiple scaling laws in high-dimensional fractal systems by estimating the Hurst distribution $\pi(dH)$ from wavelet-based spectral information. It introduces WRMSM, a pipeline that combines wavelet random matrices, a modified spectral clustering procedure (HD$\varepsilon$ES), and a model-selection step to recover the Hurst modes and their probabilities in a moderately high-dimensional regime. Theoretical contributions include a consistency theorem in the three-way limit and supporting propositions, while computational results demonstrate superior finite-sample performance over Gaussian mixture models. An application to macroeconomic time series shows evidence of cointegration, illustrating the method's practical relevance for uncovering long-run relationships in complex data.

Abstract

Scale invariance (fractality) is a prominent feature of the large-scale behavior of many stochastic systems. In this work, we construct an algorithm for the statistical identification of the Hurst distribution (in particular, the scaling exponents) undergirding a high-dimensional fractal system. The algorithm is based on wavelet random matrices, modified spectral clustering and a model selection step for picking the value of the clustering precision hyperparameter. In a moderately high-dimensional regime where the dimension, the sample size and the scale go to infinity, we show that the algorithm consistently estimates the Hurst distribution. Monte Carlo simulations show that the proposed methodology is efficient for realistic sample sizes and outperforms another popular clustering method based on mixed-Gaussian modeling. We apply the algorithm in the analysis of real-world macroeconomic time series to unveil evidence for cointegration.

A spectral clustering-type algorithm for the consistent estimation of the Hurst distribution in moderately high dimensions

TL;DR

The paper tackles the challenge of identifying multiple scaling laws in high-dimensional fractal systems by estimating the Hurst distribution from wavelet-based spectral information. It introduces WRMSM, a pipeline that combines wavelet random matrices, a modified spectral clustering procedure (HDES), and a model-selection step to recover the Hurst modes and their probabilities in a moderately high-dimensional regime. Theoretical contributions include a consistency theorem in the three-way limit and supporting propositions, while computational results demonstrate superior finite-sample performance over Gaussian mixture models. An application to macroeconomic time series shows evidence of cointegration, illustrating the method's practical relevance for uncovering long-run relationships in complex data.

Abstract

Scale invariance (fractality) is a prominent feature of the large-scale behavior of many stochastic systems. In this work, we construct an algorithm for the statistical identification of the Hurst distribution (in particular, the scaling exponents) undergirding a high-dimensional fractal system. The algorithm is based on wavelet random matrices, modified spectral clustering and a model selection step for picking the value of the clustering precision hyperparameter. In a moderately high-dimensional regime where the dimension, the sample size and the scale go to infinity, we show that the algorithm consistently estimates the Hurst distribution. Monte Carlo simulations show that the proposed methodology is efficient for realistic sample sizes and outperforms another popular clustering method based on mixed-Gaussian modeling. We apply the algorithm in the analysis of real-world macroeconomic time series to unveil evidence for cointegration.

Paper Structure

This paper contains 16 sections, 129 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The distribution of the (rescaled logarithmic) wavelet e.s.d. in the three-way limit \ref{['e:three-fold_lim']} (see Abry, Didier et al. abry:didier:orejola:wendt:2024). A Monte Carlo study displays a tri-modal distribution emerging in the rescaled logarithmic wavelet e.s.d. in the three-way limit \ref{['e:three-fold_lim']} (n.b.: after applying an affine transformation, the results are shown on the same scale as that of the distribution $\pi(dH)$). In the depicted simulation study based on $1000$ realizations, $\pi(dH)$ is a discrete uniform distribution with support $\{0.2,0.5,0.8\}$. For the left and right plots, respectively, $(\textnormal{sample size}, \textnormal{scale},\textnormal{dimension} ) = (2^{10}, 2^4, 2^3)$ and $(2^{18}, 2^6, 2^6)$. Wavelet log-eigenvalues weighted over multiple scales were used for enhanced ("debiased") finite-sample convergence (cf. Abry and Didier abry:didier:2018:n-variate and Abry et al. abry:boniece:didier:wendt:2023:regression).
  • Figure 2: How many Hurst modes? Notwithstanding the good asymptotic properties of the wavelet log-e.s.d. (see Theorem \ref{['t:main_theorem_discrete']}), over finite samples the information provided may be ambiguous for differences between Hurst modes that are small relative to the sample size. For the sake of illustration, the log-eigenvalue plot displayed might suggest the existence of two modes $0 < \breve{H}_1 < \breve{H}_2 < 1$ appearing with probabilities $\pi(\breve{H}_1) > \pi(\breve{H}_2)$. In truth, though, it was simulated based on a discrete uniform distribution on $\{0.25,0.29,0.7\}$, as indicated by the vertical dotted lines.
  • Figure 3: Bimodal Hurst distributions. Proportion of correct identification of the number of Hurst modes ($\widehat{r}_{\varepsilon_{ms}}=2$) over 1000 Monte Carlo runs. Left plot: $\pi(dH)$ non-uniform distribution. Right plot: $\pi(dH)$ uniform distribution. In both plots, WRMSM and GMM-based clustering appear in blue and red, respectively.
  • Figure 4: Trimodal Hurst distributions. Proportion of correct identification of the number of Hurst modes ($\widehat{r}_{\varepsilon_{ms}}=3$) over 1000 Monte Carlo runs. Left plot: $\breve{H}_1, \breve{H}_3$ fixed with $\breve{H}_2$ varying. Right plot: $\breve{H}_1, \breve{H}_2, \breve{H}_3$ equidistant. In both plots, WRMSM and GMM-based clustering appear in blue and red, respectively.
  • Figure 5: Optimal $\varepsilon>0$ chosen via model selection (left plot). The optimal hyperparameter value $\varepsilon_{ms}$ (see \ref{['e:model_selection_threhsold']}) was computed by means of averaging over 1000 Monte Carlo experiments. Instances of wavelet log-e.s.d. (right plots). $|\breve{H}_2-\breve{H}_1| = 0.04$ (top left), $|\breve{H}_2-\breve{H}_1| = 0.06$ (top right), $|\breve{H}_2-\breve{H}_1| = 0.08$ (bottom left), $|\breve{H}_2-\breve{H}_1| = 0.10$ (bottom right).
  • ...and 1 more figures