Table of Contents
Fetching ...

Inverse Covariance and Partial Correlation Matrix Estimation via Joint Partial Regression

Samuel Erickson, Tobias Rydén

TL;DR

This work tackles the problem of estimating sparse high-dimensional inverse covariance (precision) and partial correlation matrices from data. It introduces a two-stage, convex joint partial regression framework that simultaneously estimates the precision matrix and the partial-correlation matrix under a positive semidefinite constraint, using an initial Lasso-based step to obtain residual variances. The authors derive non-asymptotic error rates for sub-Gaussian data: $\|\widehat{Q}-Q^\star\|_F \lesssim \sqrt{\frac{s\log p}{n}}$ and $\|\widehat{\Omega}-\Omega^\star\|_F \lesssim \sqrt{\frac{(s+p)\log p}{n}}$, and present an efficient proximal-splitting algorithm (pd3o) with per-iteration cost $O(p^3)$ suitable for large $p$. Empirical results on synthetic and stock-market data show the method consistently improves estimation accuracy over the graphical lasso and yields interpretable networks reflecting sector structure. The approach offers a scalable, theoretically grounded alternative for high-dimensional graphical-model estimation with practical impact in genomics, finance, and beyond.

Abstract

We present a method for estimating sparse high-dimensional inverse covariance and partial correlation matrices, which exploits the connection between the inverse covariance matrix and linear regression. The method is a two-stage estimation method wherein each individual feature is regressed on all other features while positive semi-definiteness is enforced simultaneously. We derive non-asymptotic estimation rates for both inverse covariance and partial correlation matrix estimation. An efficient proximal splitting algorithm for numerically computing the estimate is also dervied. The effectiveness of the proposed method is demonstrated on both synthetic and real-world data.

Inverse Covariance and Partial Correlation Matrix Estimation via Joint Partial Regression

TL;DR

This work tackles the problem of estimating sparse high-dimensional inverse covariance (precision) and partial correlation matrices from data. It introduces a two-stage, convex joint partial regression framework that simultaneously estimates the precision matrix and the partial-correlation matrix under a positive semidefinite constraint, using an initial Lasso-based step to obtain residual variances. The authors derive non-asymptotic error rates for sub-Gaussian data: and , and present an efficient proximal-splitting algorithm (pd3o) with per-iteration cost suitable for large . Empirical results on synthetic and stock-market data show the method consistently improves estimation accuracy over the graphical lasso and yields interpretable networks reflecting sector structure. The approach offers a scalable, theoretically grounded alternative for high-dimensional graphical-model estimation with practical impact in genomics, finance, and beyond.

Abstract

We present a method for estimating sparse high-dimensional inverse covariance and partial correlation matrices, which exploits the connection between the inverse covariance matrix and linear regression. The method is a two-stage estimation method wherein each individual feature is regressed on all other features while positive semi-definiteness is enforced simultaneously. We derive non-asymptotic estimation rates for both inverse covariance and partial correlation matrix estimation. An efficient proximal splitting algorithm for numerically computing the estimate is also dervied. The effectiveness of the proposed method is demonstrated on both synthetic and real-world data.

Paper Structure

This paper contains 26 sections, 4 theorems, 85 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Theorem 4.1

Under Assumptions assumption:dimensionality, assumption:sub-gaussianity and assumption:bounded-eigenvalues, there exist positive constants $c$, $C_1$ and $C_2$ such that Algorithm algo:jpr with $\lambda = c\sqrt{\log(p)/n}$ outputs an estimate $\widehat{Q}$ of the partial correlation matrix that sat and an estimate $\widehat{\Omega}$ of the precision matrix that satisfies with probability at leas

Figures (3)

  • Figure 1: Average Frobenius error versus number of features with $\pm 2 \times \text{SE}$ bands.
  • Figure 2: Average operator $\ell_2$-error versus number of features with $\pm 2 \times \text{SE}$ bands.
  • Figure 3: Stock market network as estimated by the proposed method.

Theorems & Definitions (7)

  • Theorem 4.1
  • Corollary 4.2
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • proof