Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning
James Chapman, Lennie Wells, Ana Lawry Aguila
TL;DR
This work introduces an unconstrained Eckhart–Young loss $\mathcal{L}_{EY}$ that recovers the top-$K$ subspace of generalized eigenvalue problems, enabling fast SGD-based training for the Canonical Correlation Analysis (CCA) family, including RCCA, PLS, and Multiview CCA (MCCA). By formulating $C(\theta)$ and $V(\theta)$ to capture total cross-view covariance and per-view variance, the authors derive EY objectives for linear and nonlinear CCA variants and demonstrate unbiased batch gradient estimates suitable for deep learning. The framework unifies Deep CCA and SSL under a single objective, yielding new methods like DCCA-EY and SSL-EY, which (i) recover known CCA/DCCA solutions in linear or last-layer settings, (ii) achieve faster convergence and higher correlations than prior methods, and (iii) demonstrate scalability on the UK Biobank and competitive SSL performance on CIFAR-10/100 with minimal hyperparameter tuning. Theoretical results establish the absence of spurious local minima and connect SSL methods like VICReg and Barlow Twins to CCA, while empirical results show state-of-the-art performance and scalable, interpretable multiview analyses, signaling a practical and principled path for large-scale multiview and SSL tasks.
Abstract
The Canonical Correlation Analysis (CCA) family of methods is foundational in multiview learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. Our algorithms show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. These improvements allow us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 features. Finally, we apply our algorithms to match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, and also present theory to clarify the links between these methods and classical CCA, laying the groundwork for future insights.
