Beyond Correlation: Causal Multi-View Unsupervised Feature Selection Learning
Zongxin Shen, Yanyong Huang, Bin Wang, Jinyuan Chang, Shiyu Liu, Tianrui Li
TL;DR
This work tackles MUFS in unlabeled multi-view data by questioning the reliability of correlations used for feature selection and proposing a causal framework. It introduces a structural causal model to reveal how confounders can induce spurious associations and then presents CAUSA, which couples a generalized unsupervised spectral regression with a causal regularization module that adaptively separates confounders and learns view-shared sample weights. The method jointly optimizes feature selection and confounder balancing, enabling identification of causally informative features across views; an efficient alternating optimization algorithm is developed. Extensive experiments on six real multi-view datasets and synthetic data show that CAUSA outperforms state-of-the-art MUFS methods and demonstrates the value of enforcing causal reasoning in unsupervised, multi-view feature selection.
Abstract
Multi-view unsupervised feature selection (MUFS) has recently received increasing attention for its promising ability in dimensionality reduction on multi-view unlabeled data. Existing MUFS methods typically select discriminative features by capturing correlations between features and clustering labels. However, an important yet underexplored question remains: \textit{Are such correlations sufficiently reliable to guide feature selection?} In this paper, we analyze MUFS from a causal perspective by introducing a novel structural causal model, which reveals that existing methods may select irrelevant features because they overlook spurious correlations caused by confounders. Building on this causal perspective, we propose a novel MUFS method called CAusal multi-view Unsupervised feature Selection leArning (CAUSA). Specifically, we first employ a generalized unsupervised spectral regression model that identifies informative features by capturing dependencies between features and consensus clustering labels. We then introduce a causal regularization module that can adaptively separate confounders from multi-view data and simultaneously learn view-shared sample weights to balance confounder distributions, thereby mitigating spurious correlations. Thereafter, integrating both into a unified learning framework enables CAUSA to select causally informative features. Comprehensive experiments demonstrate that CAUSA outperforms several state-of-the-art methods. To our knowledge, this is the first in-depth study of causal multi-view feature selection in the unsupervised setting.
