Table of Contents
Fetching ...

Permutation Inference for Canonical Correlation Analysis

Anderson M. Winkler, Olivier Renaud, Stephen M. Smith, Thomas E. Nichols

TL;DR

It is shown that transforming the residuals to a lower dimensional basis where exchangeability holds results in a valid permutation test, and a complete algorithm for permutation inference for CCA is proposed, discussing how to address the multiplicity of tests.

Abstract

Canonical correlation analysis (CCA) has become a key tool for population neuroimaging, allowing investigation of associations between many imaging and non-imaging measurements. As other variables are often a source of variability not of direct interest, previous work has used CCA on residuals from a model that removes these effects, then proceeded directly to permutation inference. We show that such a simple permutation test leads to inflated error rates. The reason is that residualisation introduces dependencies among the observations that violate the exchangeability assumption. Even in the absence of nuisance variables, however, a simple permutation test for CCA also leads to excess error rates for all canonical correlations other than the first. The reason is that a simple permutation scheme does not ignore the variability already explained by previous canonical variables. Here we propose solutions for both problems: in the case of nuisance variables, we show that transforming the residuals to a lower dimensional basis where exchangeability holds results in a valid permutation test; for more general cases, with or without nuisance variables, we propose estimating the canonical correlations in a stepwise manner, removing at each iteration the variance already explained, while dealing with different number of variables in both sides. We also discuss how to address the multiplicity of tests, proposing an admissible test that is not conservative, and provide a complete algorithm for permutation inference for CCA.

Permutation Inference for Canonical Correlation Analysis

TL;DR

It is shown that transforming the residuals to a lower dimensional basis where exchangeability holds results in a valid permutation test, and a complete algorithm for permutation inference for CCA is proposed, discussing how to address the multiplicity of tests.

Abstract

Canonical correlation analysis (CCA) has become a key tool for population neuroimaging, allowing investigation of associations between many imaging and non-imaging measurements. As other variables are often a source of variability not of direct interest, previous work has used CCA on residuals from a model that removes these effects, then proceeded directly to permutation inference. We show that such a simple permutation test leads to inflated error rates. The reason is that residualisation introduces dependencies among the observations that violate the exchangeability assumption. Even in the absence of nuisance variables, however, a simple permutation test for CCA also leads to excess error rates for all canonical correlations other than the first. The reason is that a simple permutation scheme does not ignore the variability already explained by previous canonical variables. Here we propose solutions for both problems: in the case of nuisance variables, we show that transforming the residuals to a lower dimensional basis where exchangeability holds results in a valid permutation test; for more general cases, with or without nuisance variables, we propose estimating the canonical correlations in a stepwise manner, removing at each iteration the variance already explained, while dealing with different number of variables in both sides. We also discuss how to address the multiplicity of tests, proposing an admissible test that is not conservative, and provide a complete algorithm for permutation inference for CCA.

Paper Structure

This paper contains 29 sections, 4 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: A selection matrix is an identity matrix from which some specific rows have been removed. Pre-multiplication by a selection matrix deletes specific rows (those that correspond to columns that are all zero in the selection matrix).
  • Figure 2: Relationship between canonical correlations (horizontal axes) and associated p-values (vertical axes) for 10 realisations of scenario i, considering two estimation methods (single step and stepwise) and three multiple testing correction methods (uncorrected, corrected using the cumulative maximum, and corrected using the distribution of the maximum statistic). The figure complements Table \ref{['tab:results:estimation+mtp']} by showing example realisations that average to the error rates shown in the table for the cases in which the null space is included. For simple, uncorrected p-values, the test is inadmissible; for corrected using the distribution of the maximum statistic, the test is overly conservative; single step does not control the familywise error rate.