Canonical Correlation Analysis: review
Anna Bykhovskaya, Vadim Gorin
TL;DR
This survey traces Canonical Correlation Analysis (CCA) from foundational theorems to contemporary high-dimensional regimes, tying classical multivariate methods to random matrix theory. It highlights that, in Gaussian settings, the squared canonical correlations follow Jacobi/ Wachter-type laws, with precise limits for fixed and growing dimensions and universal fluctuations at the edge (Airy/ Tracy–Widom-type behavior). The text articulates estimation via Gaussian MLE, characterizes the distribution of canonical correlations under independence (Jacobi/Wachter), and develops spike (signal) theory for detecting and estimating a few strong cross-subspace signals, including cointegration in time series. Collectively, the work provides a rigorous, end-to-end framework for inference in high-dimensional CCA, including independence testing, spike detection, and cointegration analysis, while clarifying limitations under subcritical signals and motivating regularization approaches for practical reliability.
Abstract
For over a century canonical correlations, variables, and related concepts have been studied across various fields, with contributions dating back to Jordan [1875] and Hotelling [1936]. This text surveys the evolution of canonical correlation analysis, a fundamental statistical tool, beginning with its foundational theorems and progressing to recent developments and open research problems. Along the way we introduce and review methods, notions, and fundamental concepts from linear algebra, random matrix theory, and high-dimensional statistics, placing particular emphasis on rigorous mathematical treatment. The survey is intended for technically proficient graduate students and other researchers with an interest in this area. The content is organized into five chapters, supplemented by six sets of exercises found in Chapter 6. These exercises introduce additional material, reinforce key concepts, and serve to bridge ideas across chapters. We recommend the following sequence: first, solve Problem Set 0, then proceed with Chapter 1, solve Problem Set 1, and so on through the text.
