Redundancy-optimized Multi-head Attention Networks for Multi-View Multi-Label Feature Selection
Yuzhou Liu, Jiarui Liu, Wanfu Gao
TL;DR
MVML feature selection is challenged by complex inter-view relations and feature redundancy. RMAN-MMFS integrates one attention head per view to model view-self feature-label interactions and employs cross-view attention to capture inter-view complementarity, augmented by static and dynamic redundancy penalties. The framework delivers a unified objective and demonstrates superior performance across six real-world datasets against six baselines, producing compact, discriminative feature subsets with good generalization. The approach offers scalable, cross-view-aware feature selection with practical impact for MVML classification tasks.
Abstract
Multi-view multi-label data offers richer perspectives for artificial intelligence, but simultaneously presents significant challenges for feature selection due to the inherent complexity of interrelations among features, views and labels. Attention mechanisms provide an effective way for analyzing these intricate relationships. They can compute importance weights for information by aggregating correlations between Query and Key matrices to focus on pertinent values. However, existing attention-based feature selection methods predominantly focus on intra-view relationships, neglecting the complementarity of inter-view features and the critical feature-label correlations. Moreover, they often fail to account for feature redundancy, potentially leading to suboptimal feature subsets. To overcome these limitations, we propose a novel method based on Redundancy-optimized Multi-head Attention Networks for Multi-view Multi-label Feature Selection (RMAN-MMFS). Specifically, we employ each individual attention head to model intra-view feature relationships and use the cross-attention mechanisms between different heads to capture inter-view feature complementarity. Furthermore, we design static and dynamic feature redundancy terms: the static term mitigates redundancy within each view, while the dynamic term explicitly models redundancy between unselected and selected features across the entire selection process, thereby promoting feature compactness. Comprehensive evaluations on six real-world datasets, compared against six multi-view multi-label feature selection methods, demonstrate the superior performance of the proposed method.
