Table of Contents
Fetching ...

Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning

James Chapman, Lennie Wells, Ana Lawry Aguila

TL;DR

This work introduces an unconstrained Eckhart–Young loss $\mathcal{L}_{EY}$ that recovers the top-$K$ subspace of generalized eigenvalue problems, enabling fast SGD-based training for the Canonical Correlation Analysis (CCA) family, including RCCA, PLS, and Multiview CCA (MCCA). By formulating $C(\theta)$ and $V(\theta)$ to capture total cross-view covariance and per-view variance, the authors derive EY objectives for linear and nonlinear CCA variants and demonstrate unbiased batch gradient estimates suitable for deep learning. The framework unifies Deep CCA and SSL under a single objective, yielding new methods like DCCA-EY and SSL-EY, which (i) recover known CCA/DCCA solutions in linear or last-layer settings, (ii) achieve faster convergence and higher correlations than prior methods, and (iii) demonstrate scalability on the UK Biobank and competitive SSL performance on CIFAR-10/100 with minimal hyperparameter tuning. Theoretical results establish the absence of spurious local minima and connect SSL methods like VICReg and Barlow Twins to CCA, while empirical results show state-of-the-art performance and scalable, interpretable multiview analyses, signaling a practical and principled path for large-scale multiview and SSL tasks.

Abstract

The Canonical Correlation Analysis (CCA) family of methods is foundational in multiview learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. Our algorithms show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. These improvements allow us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 features. Finally, we apply our algorithms to match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, and also present theory to clarify the links between these methods and classical CCA, laying the groundwork for future insights.

Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning

TL;DR

This work introduces an unconstrained Eckhart–Young loss that recovers the top- subspace of generalized eigenvalue problems, enabling fast SGD-based training for the Canonical Correlation Analysis (CCA) family, including RCCA, PLS, and Multiview CCA (MCCA). By formulating and to capture total cross-view covariance and per-view variance, the authors derive EY objectives for linear and nonlinear CCA variants and demonstrate unbiased batch gradient estimates suitable for deep learning. The framework unifies Deep CCA and SSL under a single objective, yielding new methods like DCCA-EY and SSL-EY, which (i) recover known CCA/DCCA solutions in linear or last-layer settings, (ii) achieve faster convergence and higher correlations than prior methods, and (iii) demonstrate scalability on the UK Biobank and competitive SSL performance on CIFAR-10/100 with minimal hyperparameter tuning. Theoretical results establish the absence of spurious local minima and connect SSL methods like VICReg and Barlow Twins to CCA, while empirical results show state-of-the-art performance and scalable, interpretable multiview analyses, signaling a practical and principled path for large-scale multiview and SSL tasks.

Abstract

The Canonical Correlation Analysis (CCA) family of methods is foundational in multiview learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. Our algorithms show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. These improvements allow us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 features. Finally, we apply our algorithms to match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, and also present theory to clarify the links between these methods and classical CCA, laying the groundwork for future insights.
Paper Structure (34 sections, 24 theorems, 87 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 34 sections, 24 theorems, 87 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Proposition 3.0

The top-$K$ subspace of the GEP $(A,B)$ can be characterized by minimizing the following objective over $U \in \mathbb{R}^{D \times K}$: Moreover, the minimum value is precisely $- \sum_{k=1}^K \lambda_k^2$, where $(\lambda_k)$ are the generalized eigenvalues.

Figures (4)

  • Figure 1: Stochastic CCA on MediaMill using the Proportion of Correlation Captured (PCC) metric: (a) Across varying mini-batch sizes, trained for a single epoch, and (b) Training progress over a single epoch for mini-batch sizes 5, 100. Shaded regions signify $\pm$ one standard deviation around the mean of 5 runs.
  • Figure 2: Deep CCA on SplitMNIST using the Validation TCC metric: (a) after training each model for 50 epochs with varying batch sizes; (b) learning progress over 50 epochs.
  • Figure 3: Deep multiview CCA on mfeat using the Validation TMCC metric: (a) after training each model for 100 epochs with varying batch sizes; (b) learning progress over 100 epochs.
  • Figure 4: (a) Correlations between PLS components for UK Biobank. (b) Correlations between PLS brain components and genetic risk scores. AD=Alzheimer's disease, SCZ=Schizophrenia, BP=Bipolar, ADHD=Attention deficit hyperactivity disorder, ALS=Amyotrophic lateral sclerosis, PD=Parkinson's disease, EPI=Epilepsy. $\text{ns}: 0.05< p <= 1, \ast: 0.01< p <=0.05, \ast\ast: 0.001< p <= 0.01, \ast\ast\ast: 0.0001< p <= 0.001$.

Theorems & Definitions (55)

  • Proposition 3.0: Eckhart--Young inspired objective for GEPs
  • Proposition 3.0
  • Corollary 3.1: Informal: Polynomial-time Optimization
  • Definition 3.2: Family of EY Objectives
  • Lemma 3.3: Objective recovers GEP formulation of linear multiview CCA
  • proof
  • Lemma 3.3
  • proof : Proof sketch: see \ref{['supp:EY-recover-Deep-CCA']} for full details.
  • Definition A.1: Top-$K$ subspace
  • Definition A.2: $B$-orthonormality
  • ...and 45 more