Nonlinear Sparse Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data
Rong Wu, Ziqi Chen, Gen Li, Hai Shu
TL;DR
This work tackles the integration of multi-view high-dimensional biomedical data by developing nonlinear, sparse GCCA methods that extend two-view nonlinear approaches to K-view settings. The core method, HSIC-SGCCA, enforces a unit-variance constraint and solves a nonconvex optimization via a fusion of Block Prox-Linear and LADMM, while SA-KGCCA and TS-KGCCA provide multi-convex, kernelized alternatives solved by block-coordinate strategies. Empirical results on simulations and TCGA-BRCA demonstrate that HSIC-SGCCA achieves superior variable selection and enhances downstream tasks such as cancer subtype separation and survival prediction. The approaches offer practical tools for robust multi-view integration, with potential extensions into supervised and structure-informed variants to further improve interpretability and predictive performance.
Abstract
Motivation: Biomedical studies increasingly produce multi-view high-dimensional datasets (e.g., multi-omics) that demand integrative analysis. Existing canonical correlation analysis (CCA) and generalized CCA methods address at most two of the following three key aspects simultaneously: (i) nonlinear dependence, (ii) sparsity for variable selection, and (iii) generalization to more than two data views. There is a pressing need for CCA methods that integrate all three aspects to effectively analyze multi-view high-dimensional data. Results: We propose three nonlinear, sparse, generalized CCA methods, HSIC-SGCCA, SA-KGCCA, and TS-KGCCA, for variable selection in multi-view high-dimensional data. These methods extend existing SCCA-HSIC, SA-KCCA, and TS-KCCA from two-view to multi-view settings. While SA-KGCCA and TS-KGCCA yield multi-convex optimization problems solved via block coordinate descent, HSIC-SGCCA introduces a necessary unit-variance constraint previously ignored in SCCA-HSIC, resulting in a nonconvex, non-multiconvex problem. We efficiently address this challenge by integrating the block prox-linear method with the linearized alternating direction method of multipliers. Simulations and TCGA-BRCA data analysis demonstrate that HSIC-SGCCA outperforms competing methods in multi-view variable selection.
