Table of Contents
Fetching ...

Boosting multi-view association testing via devariation

Ruyi Pan, Yinqiu He, Jun Young Park

Abstract

Understanding the interplay between high-dimensional data from different views is essential in biomedical research, particularly in fields such as genomics, neuroimaging and biobank-scale studies involving high-dimensional features. Existing statistical tests for the association between two random vectors often do not fully capture dependencies between views due to limitations in modeling within-view dependencies, particularly in high-dimensional data without clear dependency patterns, which can lead to a potential loss of statistical power. In this work, we propose a novel approach termed devariation which is considered a simple yet effective preprocessing method to address the limitations by adopting a penalized low-rank factor model to flexibly capture within-view dependencies. Theoretical analysis of asymptotic power shows that devariation increases statistical power, especially when within-view correlations impact signal-to-noise ratios, while maintaining robustness in scenarios without strong internal correlations. Simulation studies demonstrate devariation's superior performance over existing methods in various scenarios. We further validate devariation in multimodal neuroimaging data from the UK Biobank study, examining the associations between imaging-derived phenotypes (IDPs) from functional, structural, and diffusion magnetic resonance imaging (MRI).

Boosting multi-view association testing via devariation

Abstract

Understanding the interplay between high-dimensional data from different views is essential in biomedical research, particularly in fields such as genomics, neuroimaging and biobank-scale studies involving high-dimensional features. Existing statistical tests for the association between two random vectors often do not fully capture dependencies between views due to limitations in modeling within-view dependencies, particularly in high-dimensional data without clear dependency patterns, which can lead to a potential loss of statistical power. In this work, we propose a novel approach termed devariation which is considered a simple yet effective preprocessing method to address the limitations by adopting a penalized low-rank factor model to flexibly capture within-view dependencies. Theoretical analysis of asymptotic power shows that devariation increases statistical power, especially when within-view correlations impact signal-to-noise ratios, while maintaining robustness in scenarios without strong internal correlations. Simulation studies demonstrate devariation's superior performance over existing methods in various scenarios. We further validate devariation in multimodal neuroimaging data from the UK Biobank study, examining the associations between imaging-derived phenotypes (IDPs) from functional, structural, and diffusion magnetic resonance imaging (MRI).

Paper Structure

This paper contains 24 sections, 4 theorems, 24 equations, 4 figures.

Key Result

Theorem 1

Under Assumption assump:S2, $\lim_{n\to\infty}\beta_{\alpha}( {\bf X} , {\bf Y} )= \alpha$ under $H_1$.

Figures (4)

  • Figure 1: Correlation matrix generated from three-view imaging-derived phenotypes (IDPs) from the ‘population-level’ neuroimaging data from the UK Biobank study with $n=39,587$ subjects. There are 339 features in the sMRI view, including white surface area, thickness, and volume, grey matter volume, and subcortical volume. dMRI contains 432 skeleton measurements covering metrics like Fractional Anisotropy (FA), Intracellular Volume Fraction (ICVF), Isotropic Volume Fraction (ISOVF), the eigenvalues of the diffusion tensor (L1, L2, L3), Mean Diffusivity (MD), Mode of Anisotropy (MA), and Orientation Dispersion (OD). The fMRI view includes 210 resting-state functional connectivity (rsFC) features.
  • Figure 2: Empirical power for different methods when data are generated from different models. Significance level $\alpha=0.05$ (dashed line).
  • Figure 3: Empirical power for different methods when data are generated from low rank model with different types of noise. Significance level $\alpha=0.05$ (dashed line).
  • Figure 4: Empirical power results of standard RV and devariation RV under different significance levels ($\alpha=0.01, 0.001$) across three pairs of data views.

Theorems & Definitions (6)

  • Remark 1
  • Remark 2
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4