Table of Contents
Fetching ...

Consistent Estimation of a Class of Distances Between Covariance Matrices

Roberto Pereira, Xavier Mestre, Davig Gregoratti

TL;DR

This paper tackles the problem of consistently estimating distances between covariance matrices directly from high-dimensional data by introducing a broad class of distances $d_M(1,2)=\sum_{l=1}^L \frac{1}{M}\mathrm{tr}[ f_1^{(l)}(\mathbf{R}_1) f_2^{(l)}(\mathbf{R}_2) ]$ that includes the Euclidean, log-Euclidean, and symmetrized KL distances. It develops a resolvent- and contour-based estimator $\hat{d}_M(1,2)$ that remains consistent as $M,N_j\to\infty$ with $c_j = M/N_j$ in various regimes, and proves a central limit theorem for the vector of distances with explicit asymptotic means and variances. The paper provides closed-form estimators for the Euclidean distance, KL divergence, and a computable LE distance, along with simplified single-integral expressions for their variances, enabling practical statistical inference. Numerical experiments show these consistent estimators outperform plug-in distances in high-dimensional settings and enable reliable clustering analyses based on covariance structure.

Abstract

This work considers the problem of estimating the distance between two covariance matrices directly from the data. Particularly, we are interested in the family of distances that can be expressed as sums of traces of functions that are separately applied to each covariance matrix. This family of distances is particularly useful as it takes into consideration the fact that covariance matrices lie in the Riemannian manifold of positive definite matrices, thereby including a variety of commonly used metrics, such as the Euclidean distance, Jeffreys' divergence, and the log-Euclidean distance. Moreover, a statistical analysis of the asymptotic behavior of this class of distance estimators has also been conducted. Specifically, we present a central limit theorem that establishes the asymptotic Gaussianity of these estimators and provides closed form expressions for the corresponding means and variances. Empirical evaluations demonstrate the superiority of our proposed consistent estimator over conventional plug-in estimators in multivariate analytical contexts. Additionally, the central limit theorem derived in this study provides a robust statistical framework to assess of accuracy of these estimators.

Consistent Estimation of a Class of Distances Between Covariance Matrices

TL;DR

This paper tackles the problem of consistently estimating distances between covariance matrices directly from high-dimensional data by introducing a broad class of distances that includes the Euclidean, log-Euclidean, and symmetrized KL distances. It develops a resolvent- and contour-based estimator that remains consistent as with in various regimes, and proves a central limit theorem for the vector of distances with explicit asymptotic means and variances. The paper provides closed-form estimators for the Euclidean distance, KL divergence, and a computable LE distance, along with simplified single-integral expressions for their variances, enabling practical statistical inference. Numerical experiments show these consistent estimators outperform plug-in distances in high-dimensional settings and enable reliable clustering analyses based on covariance structure.

Abstract

This work considers the problem of estimating the distance between two covariance matrices directly from the data. Particularly, we are interested in the family of distances that can be expressed as sums of traces of functions that are separately applied to each covariance matrix. This family of distances is particularly useful as it takes into consideration the fact that covariance matrices lie in the Riemannian manifold of positive definite matrices, thereby including a variety of commonly used metrics, such as the Euclidean distance, Jeffreys' divergence, and the log-Euclidean distance. Moreover, a statistical analysis of the asymptotic behavior of this class of distance estimators has also been conducted. Specifically, we present a central limit theorem that establishes the asymptotic Gaussianity of these estimators and provides closed form expressions for the corresponding means and variances. Empirical evaluations demonstrate the superiority of our proposed consistent estimator over conventional plug-in estimators in multivariate analytical contexts. Additionally, the central limit theorem derived in this study provides a robust statistical framework to assess of accuracy of these estimators.
Paper Structure (29 sections, 6 theorems, 188 equations, 4 figures)

This paper contains 29 sections, 6 theorems, 188 equations, 4 figures.

Key Result

Proposition 1

Under (As1)-(As4) we have almost surely. Here, $\hat{h}_{j}^{(l)}(z)$ denotes the random function where $\hat{\omega}_{j}\left( z\right)$ denotes the consistent estimator of $\omega_{j}\left( z\right)$ given by and where $\hat{\omega}_{j}^{\prime}\left( z\right)$ represents its derivative, namely Furthermore, the right hand side of (eq:asymptEqF) has bounded spectral norm with probability o

Figures (4)

  • Figure 1: Histogram of empirical distribution (in blue) and asymptotic descriptors (in orange) of different metrics EU, KL and LE arranged from top to bottom, respectively, for fixed $\rho_1 = 0.8, \rho_2 = 0.4$ .
  • Figure 2: Relative MSE related to different metrics in different scenarios (a)-(d) with respect to the growth of $N=N_1=N_2$ ($x$--axis). In all these curves, the system dimension $M$ is scaled proportionally, so that $c = M/N$ is constant.
  • Figure 3: Empirical (solid lines) and theoretical (dashed lines) probability of correct clustering (y-axis) six sample covariance matrices into three groups for growing $M$ (x-axis) and fixed $\rho_1 = \rho_2=0.3, \rho_3=\rho_4 = 0.5, \rho_5 = \rho_6 = 0.7$ using proposed estimators.
  • Figure 4: Probability of correct clustering (y-axis) six SCMs into three groups for growing $M$ (x-axis). Results for traditional plug-in estimator are depicted in dashed lines and consistent in solid lines.

Theorems & Definitions (9)

  • Remark 1
  • Remark 2
  • Proposition 1
  • Theorem 1
  • Remark 3
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Proposition 2