Table of Contents
Fetching ...

Two approaches to multiple canonical correlation analysis for repeated measures data

Tomasz Górecki, Mirosław Krzyśko, Felix Gnettner, Piotr Kokoszka

TL;DR

This work extends canonical correlation analysis to handle more than two data blocks and to functional data by proposing two general frameworks: multiple kernel CCA (MKCCA) for repeated measures using RKHS embeddings, and multiple functional CCA (MFCCA) for multivariate functional data. It develops both population formulations and regularized sample procedures, deriving consistency rates under weaker conditions such as non-compact cross-covariance operators and dependent observations. The authors show, through two real-data studies—the Polish agricultural dataset and the Global Competitiveness Index— that MFCCA often yields higher generalized canonical correlations and clearer clustering than MKCCA. The contributions provide a unified operator-theoretic view of CCA extensions, informing robust inference for high-dimensional, time-dependent, or functional data and pointing to future work in regularized and sparse variants.

Abstract

In classical canonical correlation analysis (CCA), the goal is to determine the linear transformations of two random vectors into two new random variables that are most strongly correlated. Canonical variables are pairs of these new random variables, while canonical correlations are correlations between these pairs. In this paper, we propose and study two generalizations of this classical method: (1) Instead of two random vectors we study more complex data structures that appear in important applications. In these structures, there are $L$ features, each described by $p_l$ scalars, $1 \le l \le L$. We observe $n$ such objects over $T$ time points. We derive a suitable analog of the CCA for such data. Our approach relies on embeddings into Reproducing Kernel Hilbert Spaces, and covers several related data structures as well. (2) We develop an analogous approach for multidimensional random processes. In this case, the experimental units are multivariate continuous, square-integrable functions over a given interval. These functions are modeled as elements of a Hilbert space, so in this case, we define the multiple functional canonical correlation analysis, MFCCA. We justify our approaches by their application to two data sets and suitable large sample theory. We derive consistency rates for the related transformation and correlation estimators, and show that it is possible to relax two common assumptions on the compactness of the underlying cross-covariance operators and the independence of the data.

Two approaches to multiple canonical correlation analysis for repeated measures data

TL;DR

This work extends canonical correlation analysis to handle more than two data blocks and to functional data by proposing two general frameworks: multiple kernel CCA (MKCCA) for repeated measures using RKHS embeddings, and multiple functional CCA (MFCCA) for multivariate functional data. It develops both population formulations and regularized sample procedures, deriving consistency rates under weaker conditions such as non-compact cross-covariance operators and dependent observations. The authors show, through two real-data studies—the Polish agricultural dataset and the Global Competitiveness Index— that MFCCA often yields higher generalized canonical correlations and clearer clustering than MKCCA. The contributions provide a unified operator-theoretic view of CCA extensions, informing robust inference for high-dimensional, time-dependent, or functional data and pointing to future work in regularized and sparse variants.

Abstract

In classical canonical correlation analysis (CCA), the goal is to determine the linear transformations of two random vectors into two new random variables that are most strongly correlated. Canonical variables are pairs of these new random variables, while canonical correlations are correlations between these pairs. In this paper, we propose and study two generalizations of this classical method: (1) Instead of two random vectors we study more complex data structures that appear in important applications. In these structures, there are features, each described by scalars, . We observe such objects over time points. We derive a suitable analog of the CCA for such data. Our approach relies on embeddings into Reproducing Kernel Hilbert Spaces, and covers several related data structures as well. (2) We develop an analogous approach for multidimensional random processes. In this case, the experimental units are multivariate continuous, square-integrable functions over a given interval. These functions are modeled as elements of a Hilbert space, so in this case, we define the multiple functional canonical correlation analysis, MFCCA. We justify our approaches by their application to two data sets and suitable large sample theory. We derive consistency rates for the related transformation and correlation estimators, and show that it is possible to relax two common assumptions on the compactness of the underlying cross-covariance operators and the independence of the data.

Paper Structure

This paper contains 10 sections, 2 theorems, 52 equations, 8 figures, 3 tables.

Key Result

Theorem 1

Let Assumptions asm1-asm4 hold. Furthermore, we assume that $\mathfrak C_{L}$ is closed, bounded and has isolated eigenvalues of multiplicity one. Let $\widehat{\rho}_n$ be the estimator, arising from e:CCAproblem_short for an eigenvalue $\rho$ of $\mathfrak C_{L}$ and $\widehat{\mathfrak{f}}$ the c

Figures (8)

  • Figure 1: Scatterplot for the optimally transformed feature pairs in the GCI data set (115 countries and five groups) in the system of the first two multiple kernel canonical variables $(U^{(1)}, U^{(2)})$ (with 95% confidence normal ellipses). The optimal transformations were determined by multiple kernel CCA for multivariate repeated measures data, as described in Section \ref{['sec:kcca']}.
  • Figure 2: Macroregions (left) and voivodeships (right) in Poland (2018)
  • Figure 3: Scatterplot for the optimally transformed feature pairs in the agriculture data set (16 voivodeships and seven regions) in the system of the first two multiple kernel canonical variables $(U^{(1)}, U^{(2)})$. The optimal transformations were determined by multiple kernel CCA for repeated measures data, as described in Section \ref{['sec:kcca']}.
  • Figure 4: WWRPP index for Poland (country mean = 66.6 points)
  • Figure 5: Scatterplot for the optimally transformed feature pairs in the agriculture data set (16 voivodeships and seven regions) in the system of the first two multiple functional canonical variables $(U^{(1)}, U^{(2)})$. The optimal transformations were determined by multiple functional CCA, as described in Section \ref{['sec:fcca']}.
  • ...and 3 more figures

Theorems & Definitions (13)

  • Definition 1
  • Definition 2
  • Remark 1
  • Remark 2
  • Remark 3
  • Definition 3
  • Example 1
  • Example 2
  • Theorem 1
  • proof
  • ...and 3 more