Graph Random Walk with Feature-Label Space Alignment: A Multi-Label Feature Selection Method
Wanfu Gao, Jun Gao, Qingqi Han, Hanlin Pan, Kunpeng Liu
TL;DR
This paper tackles the challenge of nonlinear and indirect feature–label relationships in high-dimensional multi-label data and the misalignment between feature and label spaces. It introduces GRW-SCMF, a graph-based method that builds a feature-label composite graph and employs a Random Walk Mutual Information matrix to capture direct and high-order indirect associations, combined with a shared low-dimensional latent space learned via nonnegative matrix factorization. The objective integrates reconstruction errors, cross-space alignment, and a sparse regularizer, optimized with a relaxed alternating-multiplier scheme and KKT-based updates, ensuring non-negativity and convergence. Across seven diverse datasets, GRW-SCMF delivers superior feature selection performance, validated through extensive ablations that confirm the contributions of both the random-walk component and the cross-space alignment, highlighting its practical impact for high-dimensional multi-label applications. The method balances explicit linear decomposition with implicit nonlinear relationship mining to produce robust, scalable feature selection in complex domains.
Abstract
The rapid growth in feature dimension may introduce implicit associations between features and labels in multi-label datasets, making the relationships between features and labels increasingly complex. Moreover, existing methods often adopt low-dimensional linear decomposition to explore the associations between features and labels. However, linear decomposition struggles to capture complex nonlinear associations and may lead to misalignment between the feature space and the label space. To address these two critical challenges, we propose innovative solutions. First, we design a random walk graph that integrates feature-feature, label-label, and feature-label relationships to accurately capture nonlinear and implicit indirect associations, while optimizing the latent representations of associations between features and labels after low-rank decomposition. Second, we align the variable spaces by leveraging low-dimensional representation coefficients, while preserving the manifold structure between the original high-dimensional multi-label data and the low-dimensional representation space. Extensive experiments and ablation studies conducted on seven benchmark datasets and three representative datasets using various evaluation metrics demonstrate the superiority of the proposed method\footnote{Code: https://github.com/Heilong623/-GRW-}.
