Table of Contents
Fetching ...

CARE: Large Precision Matrix Estimation for Compositional Data

Shucong Zhang, Huiyuan Wang, Wei Lin

TL;DR

The theory reveals an intriguing tradeoff between identification and estimation, thereby highlighting the blessing of dimensionality in compositional data analysis and proposing a composition adaptive regularized estimation (CARE) method for estimating the sparse basis precision matrix.

Abstract

High-dimensional compositional data are prevalent in many applications. The simplex constraint poses intrinsic challenges to inferring the conditional dependence relationships among the components forming a composition, as encoded by a large precision matrix. We introduce a precise specification of the compositional precision matrix and relate it to its basis counterpart, which is shown to be asymptotically identifiable under suitable sparsity assumptions. By exploiting this connection, we propose a composition adaptive regularized estimation (CARE) method for estimating the sparse basis precision matrix. We derive rates of convergence for the estimator and provide theoretical guarantees on support recovery and data-driven parameter tuning. Our theory reveals an intriguing trade-off between identification and estimation, thereby highlighting the blessing of dimensionality in compositional data analysis. In particular, in sufficiently high dimensions, the CARE estimator achieves minimax optimality and performs as well as if the basis were observed. We further discuss how our framework can be extended to handle data containing zeros, including sampling zeros and structural zeros. The advantages of CARE over existing methods are illustrated by simulation studies and an application to inferring microbial ecological networks in the human gut.

CARE: Large Precision Matrix Estimation for Compositional Data

TL;DR

The theory reveals an intriguing tradeoff between identification and estimation, thereby highlighting the blessing of dimensionality in compositional data analysis and proposing a composition adaptive regularized estimation (CARE) method for estimating the sparse basis precision matrix.

Abstract

High-dimensional compositional data are prevalent in many applications. The simplex constraint poses intrinsic challenges to inferring the conditional dependence relationships among the components forming a composition, as encoded by a large precision matrix. We introduce a precise specification of the compositional precision matrix and relate it to its basis counterpart, which is shown to be asymptotically identifiable under suitable sparsity assumptions. By exploiting this connection, we propose a composition adaptive regularized estimation (CARE) method for estimating the sparse basis precision matrix. We derive rates of convergence for the estimator and provide theoretical guarantees on support recovery and data-driven parameter tuning. Our theory reveals an intriguing trade-off between identification and estimation, thereby highlighting the blessing of dimensionality in compositional data analysis. In particular, in sufficiently high dimensions, the CARE estimator achieves minimax optimality and performs as well as if the basis were observed. We further discuss how our framework can be extended to handle data containing zeros, including sampling zeros and structural zeros. The advantages of CARE over existing methods are illustrated by simulation studies and an application to inferring microbial ecological networks in the human gut.
Paper Structure (17 sections, 10 theorems, 30 equations, 2 figures, 4 tables)

This paper contains 17 sections, 10 theorems, 30 equations, 2 figures, 4 tables.

Key Result

Theorem 1

The following relationship between $\bm\Omega_c$ and $\bm\Omega_0$ holds:

Figures (2)

  • Figure 1: The ROC curves for different methods in models (a)--(d) with $p=50,100,200,400$.
  • Figure 2: Microbial interaction networks identified by the CARE method for the (a) lean and (b) obese groups in the gut microbiome data. Positive and negative edges are displayed in green and red, respectively, with thicknesses proportional to their strengths. Node sizes are proportional to the relative abundances of genera among all samples.

Theorems & Definitions (10)

  • Theorem 1
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Lemma 1
  • Theorem 2
  • Corollary 1
  • Theorem 3
  • Theorem 4
  • Theorem 5