Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning

James Chapman; Lennie Wells; Ana Lawry Aguila

Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning

James Chapman, Lennie Wells, Ana Lawry Aguila

TL;DR

This work introduces an unconstrained Eckhart–Young loss $\mathcal{L}_{EY}$ that recovers the top-$K$ subspace of generalized eigenvalue problems, enabling fast SGD-based training for the Canonical Correlation Analysis (CCA) family, including RCCA, PLS, and Multiview CCA (MCCA). By formulating $C(\theta)$ and $V(\theta)$ to capture total cross-view covariance and per-view variance, the authors derive EY objectives for linear and nonlinear CCA variants and demonstrate unbiased batch gradient estimates suitable for deep learning. The framework unifies Deep CCA and SSL under a single objective, yielding new methods like DCCA-EY and SSL-EY, which (i) recover known CCA/DCCA solutions in linear or last-layer settings, (ii) achieve faster convergence and higher correlations than prior methods, and (iii) demonstrate scalability on the UK Biobank and competitive SSL performance on CIFAR-10/100 with minimal hyperparameter tuning. Theoretical results establish the absence of spurious local minima and connect SSL methods like VICReg and Barlow Twins to CCA, while empirical results show state-of-the-art performance and scalable, interpretable multiview analyses, signaling a practical and principled path for large-scale multiview and SSL tasks.

Abstract

The Canonical Correlation Analysis (CCA) family of methods is foundational in multiview learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. Our algorithms show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. These improvements allow us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 features. Finally, we apply our algorithms to match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, and also present theory to clarify the links between these methods and classical CCA, laying the groundwork for future insights.

Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning

TL;DR

This work introduces an unconstrained Eckhart–Young loss

that recovers the top-

subspace of generalized eigenvalue problems, enabling fast SGD-based training for the Canonical Correlation Analysis (CCA) family, including RCCA, PLS, and Multiview CCA (MCCA). By formulating

and

to capture total cross-view covariance and per-view variance, the authors derive EY objectives for linear and nonlinear CCA variants and demonstrate unbiased batch gradient estimates suitable for deep learning. The framework unifies Deep CCA and SSL under a single objective, yielding new methods like DCCA-EY and SSL-EY, which (i) recover known CCA/DCCA solutions in linear or last-layer settings, (ii) achieve faster convergence and higher correlations than prior methods, and (iii) demonstrate scalability on the UK Biobank and competitive SSL performance on CIFAR-10/100 with minimal hyperparameter tuning. Theoretical results establish the absence of spurious local minima and connect SSL methods like VICReg and Barlow Twins to CCA, while empirical results show state-of-the-art performance and scalable, interpretable multiview analyses, signaling a practical and principled path for large-scale multiview and SSL tasks.

Abstract

Paper Structure (34 sections, 24 theorems, 87 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 34 sections, 24 theorems, 87 equations, 4 figures, 1 table, 1 algorithm.

Introduction
A unified approach to the CCA family
Background: GEPs in linear algebra
The CCA Family
Novel Objectives and Algorithms
Unconstrained objective for GEPs
Corresponding Objectives for the CCA family
Applications to multiview stochastic CCA and PLS, and Deep CCA
Application to SSL
Related Work
Experiments
Evaluating CCA-EY for Stochastic CCA
Evaluating DCCA-EY for Deep CCA
Extending DCCA-EY to the multiview setting
Scaling PLS to the UK Biobank with PLS-EY
...and 19 more sections

Key Result

Proposition 3.0

The top-$K$ subspace of the GEP $(A,B)$ can be characterized by minimizing the following objective over $U \in \mathbb{R}^{D \times K}$: Moreover, the minimum value is precisely $- \sum_{k=1}^K \lambda_k^2$, where $(\lambda_k)$ are the generalized eigenvalues.

Figures (4)

Figure 1: Stochastic CCA on MediaMill using the Proportion of Correlation Captured (PCC) metric: (a) Across varying mini-batch sizes, trained for a single epoch, and (b) Training progress over a single epoch for mini-batch sizes 5, 100. Shaded regions signify $\pm$ one standard deviation around the mean of 5 runs.
Figure 2: Deep CCA on SplitMNIST using the Validation TCC metric: (a) after training each model for 50 epochs with varying batch sizes; (b) learning progress over 50 epochs.
Figure 3: Deep multiview CCA on mfeat using the Validation TMCC metric: (a) after training each model for 100 epochs with varying batch sizes; (b) learning progress over 100 epochs.
Figure 4: (a) Correlations between PLS components for UK Biobank. (b) Correlations between PLS brain components and genetic risk scores. AD=Alzheimer's disease, SCZ=Schizophrenia, BP=Bipolar, ADHD=Attention deficit hyperactivity disorder, ALS=Amyotrophic lateral sclerosis, PD=Parkinson's disease, EPI=Epilepsy. $\text{ns}: 0.05< p <= 1, \ast: 0.01< p <=0.05, \ast\ast: 0.001< p <= 0.01, \ast\ast\ast: 0.0001< p <= 0.001$.

Theorems & Definitions (55)

Proposition 3.0: Eckhart--Young inspired objective for GEPs
Proposition 3.0
Corollary 3.1: Informal: Polynomial-time Optimization
Definition 3.2: Family of EY Objectives
Lemma 3.3: Objective recovers GEP formulation of linear multiview CCA
proof
Lemma 3.3
proof : Proof sketch: see \ref{['supp:EY-recover-Deep-CCA']} for full details.
Definition A.1: Top-$K$ subspace
Definition A.2: $B$-orthonormality
...and 45 more

Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning

TL;DR

Abstract

Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (55)