Table of Contents
Fetching ...

Detection of Block-Exchangeable Structure in Large-Scale Correlation Matrices

Samuel Perreault, Thierry Duchesne, Johanna G. Nešlehová

Abstract

Correlation matrices are omnipresent in multivariate data analysis. When the number d of variables is large, the sample estimates of correlation matrices are typically noisy and conceal underlying dependence patterns. We consider the case when the variables can be grouped into K clusters with exchangeable dependence; this assumption is often made in applications, e.g., in finance and econometrics. Under this partial exchangeability condition, the corresponding correlation matrix has a block structure and the number of unknown parameters is reduced from d(d-1)/2 to at most K(K+1)/2. We propose a robust algorithm based on Kendall's rank correlation to identify the clusters without assuming the knowledge of K a priori or anything about the margins except continuity. The corresponding block-structured estimator performs considerably better than the sample Kendall rank correlation matrix when K < d. The new estimator can also be much more efficient in finite samples even in the unstructured case K = d, although there is no gain asymptotically. When the distribution of the data is elliptical, the results extend to linear correlation matrices and their inverses. The procedure is illustrated on financial stock returns.

Detection of Block-Exchangeable Structure in Large-Scale Correlation Matrices

Abstract

Correlation matrices are omnipresent in multivariate data analysis. When the number d of variables is large, the sample estimates of correlation matrices are typically noisy and conceal underlying dependence patterns. We consider the case when the variables can be grouped into K clusters with exchangeable dependence; this assumption is often made in applications, e.g., in finance and econometrics. Under this partial exchangeability condition, the corresponding correlation matrix has a block structure and the number of unknown parameters is reduced from d(d-1)/2 to at most K(K+1)/2. We propose a robust algorithm based on Kendall's rank correlation to identify the clusters without assuming the knowledge of K a priori or anything about the margins except continuity. The corresponding block-structured estimator performs considerably better than the sample Kendall rank correlation matrix when K < d. The new estimator can also be much more efficient in finite samples even in the unstructured case K = d, although there is no gain asymptotically. When the distribution of the data is elliptical, the results extend to linear correlation matrices and their inverses. The procedure is illustrated on financial stock returns.

Paper Structure

This paper contains 18 sections, 18 theorems, 86 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

Suppose that the partition $\mathcal{G}$ of $\{1,\dots, d\}$ satisfies the ass:main and that $X_{i_1} \sim X_{i_2}$ and $X_{j_1} \sim X_{j_2}$, where $i_1 \neq j_1$ and $i_2 \neq j_2$. Then the copulas $C_{i_1j_1}$ and $C_{i_2j_2}$ are identical and, consequently, $\boldsymbol{\mathrm{T}}_{i_1j_1} =

Figures (10)

  • Figure 1: The empirical Kendall's tau matrix of $107$ stocks included in the NASDAQ100 index in the original labeling (left) and after relabeling (middle). The right panel shows the improved estimate obtained from Algorithm \ref{['algo:path']} and structure selection with $\alpha = .5$.
  • Figure 1: The matrix $\boldsymbol{\mathrm{T}}$ (left) and a sub-matrix of $\boldsymbol{\Sigma}$ (right) from Example \ref{['ex:A.1']}. The cells are tinted so that, in each matrix, all entries sharing the same value are of the same color and color intensity.
  • Figure 2: Cluster membership and Kendall correlation matrices before ($\boldsymbol{\Delta}^*$ and $\boldsymbol{\mathrm{T}}^*$) and after ($\boldsymbol{\Delta}$ and $\boldsymbol{\mathrm{T}}$) relabeling of the variables.
  • Figure 2: Submatrices of $\boldsymbol{\Sigma}$, $\boldsymbol{\Theta}$ and $(\boldsymbol{\tau} + \mathbf{1})(\boldsymbol{\tau} + \mathbf{1})^\top$ from Example \ref{['ex:A.2']}. The same vectorization of $\boldsymbol{\mathrm{T}}$ as in Example \ref{['ex:A.1']} is used.
  • Figure 3: The matrices $\boldsymbol{\mathrm{T}}$, $\boldsymbol{\hat{\mathrm{T}}}$ and $\boldsymbol{\tilde{\mathrm{T}}}$ in Example \ref{['ex:3']}.
  • ...and 5 more figures

Theorems & Definitions (47)

  • Definition 1
  • Proposition 1
  • Remark 1
  • Example 1
  • Example 2
  • Theorem 1
  • Remark 2
  • Theorem 2
  • Example 3
  • Definition 2
  • ...and 37 more