Table of Contents
Fetching ...

Cellwise and Casewise Robust Covariance in High Dimensions

Fabio Centofanti, Mia Hubert, Peter J. Rousseeuw

TL;DR

This work tackles robust covariance estimation in high-dimensional data contaminated by both cellwise and casewise outliers, including missing values. It develops cellRCov, a covariance estimator built on a covariance decomposition $\boldsymbol\Sigma = \boldsymbol\Sigma_{X^k} + \boldsymbol\Sigma_{X^{\perp}}$, where a robust PCA-based step yields the principal subspace and a robustly imputed residual covariances are combined with ridge regularization for stability. The authors establish consistency and asymptotic normality, derive both casewise and cellwise influence functions, and demonstrate superior performance in simulations and real-data tasks such as anomaly detection and robust canonical correlation analysis (cellRCCA). The method offers a practical, scalable tool for robust multivariate analysis in high dimensions, with data-driven procedures to select the rank $k$ and the regularization parameter $\delta$. Overall, cellRCov enables reliable inference under complex contamination, expanding the toolkit for high-dimensional robust statistics.

Abstract

The sample covariance matrix is a cornerstone of multivariate statistics, but it is highly sensitive to outliers. These can be casewise outliers, such as cases belonging to a different population, or cellwise outliers, which are deviating cells (entries) of the data matrix. Recently some robust covariance estimators have been developed that can handle both types of outliers, but their computation is only feasible up to at most 20 dimensions. To remedy this we propose the cellRCov method, a robust covariance estimator that simultaneously handles casewise outliers, cellwise outliers, and missing data. It relies on a decomposition of the covariance on principal and orthogonal subspaces, leveraging recent work on robust PCA. It also employs a ridge-type regularization to stabilize the estimated covariance matrix. We establish some theoretical properties of cellRCov, including its casewise and cellwise influence functions as well as consistency and asymptotic normality. A simulation study demonstrates the superior performance of cellRCov in contaminated and missing data scenarios. Furthermore, its practical utility is illustrated in a real-world application to anomaly detection. We also construct and illustrate the cellRCCA method for robust and regularized canonical correlation analysis.

Cellwise and Casewise Robust Covariance in High Dimensions

TL;DR

This work tackles robust covariance estimation in high-dimensional data contaminated by both cellwise and casewise outliers, including missing values. It develops cellRCov, a covariance estimator built on a covariance decomposition , where a robust PCA-based step yields the principal subspace and a robustly imputed residual covariances are combined with ridge regularization for stability. The authors establish consistency and asymptotic normality, derive both casewise and cellwise influence functions, and demonstrate superior performance in simulations and real-data tasks such as anomaly detection and robust canonical correlation analysis (cellRCCA). The method offers a practical, scalable tool for robust multivariate analysis in high dimensions, with data-driven procedures to select the rank and the regularization parameter . Overall, cellRCov enables reliable inference under complex contamination, expanding the toolkit for high-dimensional robust statistics.

Abstract

The sample covariance matrix is a cornerstone of multivariate statistics, but it is highly sensitive to outliers. These can be casewise outliers, such as cases belonging to a different population, or cellwise outliers, which are deviating cells (entries) of the data matrix. Recently some robust covariance estimators have been developed that can handle both types of outliers, but their computation is only feasible up to at most 20 dimensions. To remedy this we propose the cellRCov method, a robust covariance estimator that simultaneously handles casewise outliers, cellwise outliers, and missing data. It relies on a decomposition of the covariance on principal and orthogonal subspaces, leveraging recent work on robust PCA. It also employs a ridge-type regularization to stabilize the estimated covariance matrix. We establish some theoretical properties of cellRCov, including its casewise and cellwise influence functions as well as consistency and asymptotic normality. A simulation study demonstrates the superior performance of cellRCov in contaminated and missing data scenarios. Furthermore, its practical utility is illustrated in a real-world application to anomaly detection. We also construct and illustrate the cellRCCA method for robust and regularized canonical correlation analysis.

Paper Structure

This paper contains 16 sections, 12 theorems, 126 equations, 13 figures.

Key Result

Theorem 1

The casewise and cellwise influence functions of $\mathop{\mathrm{vec}}\nolimits(\boldsymbol \Sigma)$ are All of these terms are computed in Propositions prop1 to prop3 in Section A of the Supplementary Material. In particular in which $\otimes$ is the Kronecker product and $\boldsymbol K_{p,k}$ is a $pk \times pk$ permutation matrix.

Figures (13)

  • Figure 1: The casewise (left) and cellwise (right) IF of $\sigma_{11}$ at the bivariate normal $H_0$.
  • Figure 2: Average KL attained by cellRCov, RCov, RSpearman, and caseMRCD in the presence of cellwise outliers, casewise outliers, or both for the A09 covariance model and dimensions $p$ in $\lbrace30,60,120\rbrace$.
  • Figure 3: Average KL attained by cellRCov, RCov, RSpearman, and caseMRCD in the presence of cellwise outliers, casewise outliers, or both for the A09 covariance model and dimensions $p$ in $\lbrace30,60,120\rbrace$, with 20% of missing cells.
  • Figure 4: The 115 DRC measurements in $[m\Omega]$ corresponding to five spot welding points. Red curves are cases where expulsion occurred.
  • Figure 5: ROC curves of the detection rules using cellRCov, RCov, RSpearman, and caseMRCD.
  • ...and 8 more figures

Theorems & Definitions (23)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Proposition 1
  • Lemma 1: Implicit Function Theorem
  • proof : Proof of Proposition \ref{['prop1']}
  • Proposition 2
  • proof : Proof of Proposition \ref{['prop2']}
  • Proposition 3
  • proof : Proof of Proposition \ref{['prop3']}
  • ...and 13 more