Table of Contents
Fetching ...

Canonical Correlation Analysis: review

Anna Bykhovskaya, Vadim Gorin

TL;DR

This survey traces Canonical Correlation Analysis (CCA) from foundational theorems to contemporary high-dimensional regimes, tying classical multivariate methods to random matrix theory. It highlights that, in Gaussian settings, the squared canonical correlations follow Jacobi/ Wachter-type laws, with precise limits for fixed and growing dimensions and universal fluctuations at the edge (Airy/ Tracy–Widom-type behavior). The text articulates estimation via Gaussian MLE, characterizes the distribution of canonical correlations under independence (Jacobi/Wachter), and develops spike (signal) theory for detecting and estimating a few strong cross-subspace signals, including cointegration in time series. Collectively, the work provides a rigorous, end-to-end framework for inference in high-dimensional CCA, including independence testing, spike detection, and cointegration analysis, while clarifying limitations under subcritical signals and motivating regularization approaches for practical reliability.

Abstract

For over a century canonical correlations, variables, and related concepts have been studied across various fields, with contributions dating back to Jordan [1875] and Hotelling [1936]. This text surveys the evolution of canonical correlation analysis, a fundamental statistical tool, beginning with its foundational theorems and progressing to recent developments and open research problems. Along the way we introduce and review methods, notions, and fundamental concepts from linear algebra, random matrix theory, and high-dimensional statistics, placing particular emphasis on rigorous mathematical treatment. The survey is intended for technically proficient graduate students and other researchers with an interest in this area. The content is organized into five chapters, supplemented by six sets of exercises found in Chapter 6. These exercises introduce additional material, reinforce key concepts, and serve to bridge ideas across chapters. We recommend the following sequence: first, solve Problem Set 0, then proceed with Chapter 1, solve Problem Set 1, and so on through the text.

Canonical Correlation Analysis: review

TL;DR

This survey traces Canonical Correlation Analysis (CCA) from foundational theorems to contemporary high-dimensional regimes, tying classical multivariate methods to random matrix theory. It highlights that, in Gaussian settings, the squared canonical correlations follow Jacobi/ Wachter-type laws, with precise limits for fixed and growing dimensions and universal fluctuations at the edge (Airy/ Tracy–Widom-type behavior). The text articulates estimation via Gaussian MLE, characterizes the distribution of canonical correlations under independence (Jacobi/Wachter), and develops spike (signal) theory for detecting and estimating a few strong cross-subspace signals, including cointegration in time series. Collectively, the work provides a rigorous, end-to-end framework for inference in high-dimensional CCA, including independence testing, spike detection, and cointegration analysis, while clarifying limitations under subcritical signals and motivating regularization approaches for practical reliability.

Abstract

For over a century canonical correlations, variables, and related concepts have been studied across various fields, with contributions dating back to Jordan [1875] and Hotelling [1936]. This text surveys the evolution of canonical correlation analysis, a fundamental statistical tool, beginning with its foundational theorems and progressing to recent developments and open research problems. Along the way we introduce and review methods, notions, and fundamental concepts from linear algebra, random matrix theory, and high-dimensional statistics, placing particular emphasis on rigorous mathematical treatment. The survey is intended for technically proficient graduate students and other researchers with an interest in this area. The content is organized into five chapters, supplemented by six sets of exercises found in Chapter 6. These exercises introduce additional material, reinforce key concepts, and serve to bridge ideas across chapters. We recommend the following sequence: first, solve Problem Set 0, then proceed with Chapter 1, solve Problem Set 1, and so on through the text.

Paper Structure

This paper contains 40 sections, 35 theorems, 154 equations, 10 figures.

Key Result

Theorem 1.1

Let $\mathbf W$ be a linear spaceThroughout this text we work with real vector spaces, such as $\mathbb{R}^n$, though the theory also extends to complex spaces like $\mathbb{C}^n$, where some aspects are, in fact, simpler. with a scalar product $\langle \cdot,\cdot\rangle$. Suppose that $K\le M$ and where $1\ge c_1\ge c_2 \ge \dots \ge c_{K}\ge 0$.

Figures (10)

  • Figure 1: S$\&$P 100 stocks log-prices vs time increments and Wachter.
  • Figure 2: Cryptocurrency log-prices vs time increments and Wachter.
  • Figure 3: Cyclical vs non-cyclical stock returns and Wachter.
  • Figure 4: Histogram of simulated squared sample canonical correlations (blue columns). $K=100$, $M=150$, $S=500$. The density of the Wachter distribution is shown in orange.
  • Figure 5: Functions in \ref{['eq_zrho']}, \ref{['eq_sx']}, \ref{['eq_sy']} for $K=1000,\,M=1500,\,S=8000$.
  • ...and 5 more figures

Theorems & Definitions (81)

  • Theorem 1.1
  • Proposition 1.2
  • Remark 1.3
  • Lemma 1.4
  • proof
  • proof : Proof of Proposition \ref{['Proposition_CCA_as_eigenvectors']}
  • Corollary 1.5
  • proof
  • Proposition 1.6
  • proof
  • ...and 71 more