Table of Contents
Fetching ...

Covariance Scattering Transforms

Andrea Cavallo, Ayushman Raghuvanshi, Sundeep Prabhakar Chepuri, Elvin Isufi

TL;DR

The paper tackles robust covariance-based representations without supervision. It introduces Covariance Scattering Transforms (CSTs), a deep, untrained architecture built from covariance wavelets that spectrally filter the covariance matrix and produce hierarchical embeddings, with pruning for efficiency. It proves permutation equivariance and stability to finite-sample perturbations, with bounds that scale as $O(1/\,\sqrt{T})$ and separate from eigengap considerations. Empirically, CSTs yield stable, competitive age-prediction performance from cortical thickness across four datasets, outperforming PCA and rivaling VNNs while requiring no training.

Abstract

Machine learning and data processing techniques relying on covariance information are widespread as they identify meaningful patterns in unsupervised and unlabeled settings. As a prominent example, Principal Component Analysis (PCA) projects data points onto the eigenvectors of their covariance matrix, capturing the directions of maximum variance. This mapping, however, falls short in two directions: it fails to capture information in low-variance directions, relevant when, e.g., the data contains high-variance noise; and it provides unstable results in low-sample regimes, especially when covariance eigenvalues are close. CoVariance Neural Networks (VNNs), i.e., graph neural networks using the covariance matrix as a graph, show improved stability to estimation errors and learn more expressive functions in the covariance spectrum than PCA, but require training and operate in a labeled setup. To get the benefits of both worlds, we propose Covariance Scattering Transforms (CSTs), deep untrained networks that sequentially apply filters localized in the covariance spectrum to the input data and produce expressive hierarchical representations via nonlinearities. We define the filters as covariance wavelets that capture specific and detailed covariance spectral patterns. We improve CSTs' computational and memory efficiency via a pruning mechanism, and we prove that their error due to finite-sample covariance estimations is less sensitive to close covariance eigenvalues compared to PCA, improving their stability. Our experiments on age prediction from cortical thickness measurements on 4 datasets collecting patients with neurodegenerative diseases show that CSTs produce stable representations in low-data settings, as VNNs but without any training, and lead to comparable or better predictions w.r.t. more complex learning models.

Covariance Scattering Transforms

TL;DR

The paper tackles robust covariance-based representations without supervision. It introduces Covariance Scattering Transforms (CSTs), a deep, untrained architecture built from covariance wavelets that spectrally filter the covariance matrix and produce hierarchical embeddings, with pruning for efficiency. It proves permutation equivariance and stability to finite-sample perturbations, with bounds that scale as and separate from eigengap considerations. Empirically, CSTs yield stable, competitive age-prediction performance from cortical thickness across four datasets, outperforming PCA and rivaling VNNs while requiring no training.

Abstract

Machine learning and data processing techniques relying on covariance information are widespread as they identify meaningful patterns in unsupervised and unlabeled settings. As a prominent example, Principal Component Analysis (PCA) projects data points onto the eigenvectors of their covariance matrix, capturing the directions of maximum variance. This mapping, however, falls short in two directions: it fails to capture information in low-variance directions, relevant when, e.g., the data contains high-variance noise; and it provides unstable results in low-sample regimes, especially when covariance eigenvalues are close. CoVariance Neural Networks (VNNs), i.e., graph neural networks using the covariance matrix as a graph, show improved stability to estimation errors and learn more expressive functions in the covariance spectrum than PCA, but require training and operate in a labeled setup. To get the benefits of both worlds, we propose Covariance Scattering Transforms (CSTs), deep untrained networks that sequentially apply filters localized in the covariance spectrum to the input data and produce expressive hierarchical representations via nonlinearities. We define the filters as covariance wavelets that capture specific and detailed covariance spectral patterns. We improve CSTs' computational and memory efficiency via a pruning mechanism, and we prove that their error due to finite-sample covariance estimations is less sensitive to close covariance eigenvalues compared to PCA, improving their stability. Our experiments on age prediction from cortical thickness measurements on 4 datasets collecting patients with neurodegenerative diseases show that CSTs produce stable representations in low-data settings, as VNNs but without any training, and lead to comparable or better predictions w.r.t. more complex learning models.

Paper Structure

This paper contains 42 sections, 9 theorems, 89 equations, 12 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Consider a CST $\mathbf{\Phi}$ computed from a dataset $\mathbf{X} \in \mathbb{R}^{N\times T}$, and a CST $\mathbf{\hat{\mathbf{\Phi}}}$ computed from a dataset $\mathbf{\hat{X}} = \mathbf{\Pi}\mathbf{X}$ given permutation matrix $\mathbf{\Pi} \in \mathbb{R}^{N\times N}$. If $U$ is permutation equiv

Figures (12)

  • Figure 1: Filters on the covariance eigenvalues $w$ and their scaled versions $\lambda$ (see Method section for details). PCA acts as a high-pass filter selecting only the top $k$ eigenvalues (here $k=5$), whereas covariance wavelets provide more complex and localized filter shapes.
  • Figure 2: CSTs features are obtained via sequential application of wavelets at different scales, interleaved with non-linear activations $\rho$ and aggregation operators $U$ to produce the final coefficients collected in $\mathbf{\Phi}$.
  • Figure 3: Age prediction Mean Average Error (MAE) and embedding Mean Squared Error (MSE) for increasing number of samples for CSTs, VNN and PCA on, from left to right, ADNI1, ADNI2, PPMI and Abide.
  • Figure 4: Impact of different thresholds $\tau$ for pruning on regression MAE, execution time and number of features on ADNI1.
  • Figure 5: MAE for different labeled data sizes on ADNI1 for $U$ as identity operator (left) and $U$ as mean operator (right).
  • ...and 7 more figures

Theorems & Definitions (15)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Proposition 1
  • Theorem 3
  • Lemma 1
  • proof
  • Definition 2
  • Lemma 2
  • proof
  • ...and 5 more