High-dimensional Clustering and Signal Recovery under Block Signals
Wu Su, Yumou Qiu
TL;DR
This work develops computationally efficient, minimax-optimal approaches for high-dimensional clustering and signal recovery under block-structured means. By separating sparse-block (CFA-PCA) and dense-block (MA-PCA) regimes, it leverages block contiguity to boost statistical power under general sub-Gaussian noise and bandable covariances. The authors establish both statistical and computational minimax lower bounds, revealing phase transitions between impossibility and possibility under polynomial-time constraints, and demonstrate that CFA-PCA and MA-PCA attain these bounds in their respective regimes. Extensions to tensor data and extensive simulations, plus a real-world case study on global temperature changes, validate the practical utility of exploiting block structure in high-dimensional clustering and signal identification.
Abstract
This paper studies computationally efficient methods and their minimax optimality for high-dimensional clustering and signal recovery under block signal structures. We propose two sets of methods, cross-block feature aggregation PCA (CFA-PCA) and moving average PCA (MA-PCA), designed for sparse and dense block signals, respectively. Both methods adaptively utilize block signal structures, applicable to non-Gaussian data with heterogeneous variances and non-diagonal covariance matrices. Specifically, the CFA method utilizes a block-wise U-statistic to aggregate and select block signals non-parametrically from data with unknown cluster labels. We show that the proposed methods are consistent for both clustering and signal recovery under mild conditions and weaker signal strengths than the existing methods without considering block structures of signals. Furthermore, we derive both statistical and computational minimax lower bounds (SMLB and CMLB) for high-dimensional clustering and signal recovery under block signals, where the CMLBs are restricted to algorithms with polynomial computation complexity. The minimax boundaries partition signals into regions of impossibility and possibility. No algorithm (or no polynomial time algorithm) can achieve consistent clustering or signal recovery if the signals fall into the statistical (or computational) region of impossibility. We show that the proposed CFA-PCA and MA-PCA methods can achieve the CMLBs for the sparse and dense block signal regimes, respectively, indicating the proposed methods are computationally minimax optimal. A tuning parameter selection method is proposed based on post-clustering signal recovery results. Simulation studies are conducted to evaluate the proposed methods. A case study on global temperature change demonstrates their utility in practice.
