Distributional stability of sparse inverse covariance matrix estimators

Renjie Chen; Huifu Xu; Henryk Zähle

Distributional stability of sparse inverse covariance matrix estimators

Renjie Chen, Huifu Xu, Henryk Zähle

Abstract

Finding an approximation of the inverse of the covariance matrix, also known as precision matrix, of a random vector with empirical data is widely discussed in finance and engineering. In data-driven problems, empirical data may be ``contaminated''. This raises the question as to whether the approximate precision matrix is reliable from a statistical point of view. In this paper, we concentrate on a much-noticed sparse estimator of the precision matrix and investigate the issue from the perspective of distributional stability. Specifically, we derive an explicit local Lipschitz bound for the distance between the distributions of the sparse estimator under two different distributions (regarded as the true data distribution and the distribution of ``contaminated'' data). The distance is measured by the Kantorovich metric on the set of all probability measures on a matrix space. We also present analogous results for the standard estimators of the covariance matrix and its eigenvalues. Furthermore, we discuss several applications and conduct some numerical experiments.

Distributional stability of sparse inverse covariance matrix estimators

Abstract

Paper Structure (37 sections, 22 theorems, 108 equations, 4 figures)

This paper contains 37 sections, 22 theorems, 108 equations, 4 figures.

Introduction
Basic notation
A criterion for distributional stability of general estimators
Optimization problem underlying the estimator S_N
Formulation and existence of a unique minimizer
Continuity of the minimizer in $\Sigma$
Lipschitz continuity of the minimizer in $\Sigma$
Distributional stability of S_N and of Sigma_N and its eigenvalues
Applications and numerical experiments
Distributional stability of the eigenvalues of the sample covariance matrix
Distributional sensitivity of the inverse of the sample covariance matrix
Gaussian graphical model selection and its application in cancer genetic network inference
Distributional stability of the sparse estimator in Gaussian graphical models
Numerical experiment in cancer genetic network inference
Portfolio optimization
...and 22 more sections

Key Result

Theorem 3.1

Assume that there exist constants $\kappa_1,\kappa_2\in\mathbb{R}_{+}$ such that holds true for all $\hat{\bm x}=(\hat{x}^1,\ldots,\hat{x}^N),\,\tilde{\bm x}=(\tilde{x}^1,\ldots,\tilde{x}^N)\in X^{N}$. Then for all $P,Q\in\mathscr{P}_2(X)$ and $N\in\mathbb{N}$.

Figures (4)

Figure 1: $\widehat{\mathsf {d l}}_{{\rm K},M}(\mathbb{P}^P \circ \widehat{\lambda}_{i,N}^{-1},\mathbb{P}^Q \circ \widehat{\lambda}_{i,N}^{-1})$ (y-axis) as a function of $\widehat{\mathsf {d l}}_{2}(P,Q)$, $Q\in\mathscr{Q}$ (x-axis), for different sample sizes $N$.
Figure 2: $\widehat{\mathsf {d l}}_{{\rm K},M}(\mathbb{P}^P \circ \widehat{S}_N^{-1},\mathbb{P}^Q \circ \widehat{S}_N^{-1})$ (y-axis) as a function of $\widehat{\mathsf {d l}}_{2}(P,Q)$, $Q\in\mathscr{Q}$ (x-axis), for different sample size $N$.
Figure 3: $\widehat{\mathsf {d l}}_{{\rm K},M}(\mathbb{P}^P\circ\widetilde{S}_N^{-1},\mathbb{P}^Q\circ\widetilde{S}_N^{-1})$ and structure match accuracy (y-axis) as a function of $\widehat{\mathsf {d l}}_{2}(P,Q)$ (x-axis) for different choice of $\lambda$ and different sample sizes $N$.
Figure 4: $\widehat{\mathsf {d l}}_{{\rm K},M}(\mathbb{P}^P \circ \widehat{v}_N^{-1},\mathbb{P}^Q \circ \widehat{v}_N^{-1})$ as a function of $\widehat{\mathsf {d l}}_{2}(P,Q)$, $Q\in\mathscr{Q}$

Theorems & Definitions (31)

Theorem 3.1
Proposition 3.1
Proposition 4.1
Proposition 4.2
Theorem 4.1
Proposition 4.3
Theorem 4.2
Theorem 5.1
Theorem 5.2
Theorem 5.3
...and 21 more

Distributional stability of sparse inverse covariance matrix estimators

Abstract

Distributional stability of sparse inverse covariance matrix estimators

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (31)