Semantic-Preserving Feature Partitioning for Multi-View Ensemble Learning
Mohammad Sadegh Khorshidi, Navid Yazdanjue, Hassan Gharoun, Danial Yazdani, Mohammad Reza Nikoo, Fang Chen, Amir H. Gandomi
TL;DR
This paper introduces Semantic-Preserving Feature Partitioning (SPFP), an information-theoretic method to construct semantically coherent artificial views for multi-view ensemble learning (MEL) from a single data source. By modifying the conditional likelihood framework (CLF) and defining a SPFP objective with $ abla$ coefficients and a stopping rule based on entropy and mutual information, SPFP can partition features into multiple views that preserve the information content of the full feature set. The method is validated on eight real-world datasets using XGBoost and Logistic Regression, showing that SPFP-generated views and their ensembles often improve accuracy and reduce predictive uncertainty, while offering computational efficiency through dimensionality reduction. Statistical analyses (Friedman, Conover, and Cliff's delta) reveal significant differences with large effect sizes, supporting the practical value of semantic view construction in MEL, though gains vary by dataset and model complexity. Overall, SPFP provides a rigorous, scalable approach to view construction that balances information preservation and computational cost, with potential extensions to unsupervised settings.
Abstract
In machine learning, the exponential growth of data and the associated ``curse of dimensionality'' pose significant challenges, particularly with expansive yet sparse datasets. Addressing these challenges, multi-view ensemble learning (MEL) has emerged as a transformative approach, with feature partitioning (FP) playing a pivotal role in constructing artificial views for MEL. Our study introduces the Semantic-Preserving Feature Partitioning (SPFP) algorithm, a novel method grounded in information theory. The SPFP algorithm effectively partitions datasets into multiple semantically consistent views, enhancing the MEL process. Through extensive experiments on eight real-world datasets, ranging from high-dimensional with limited instances to low-dimensional with high instances, our method demonstrates notable efficacy. It maintains model accuracy while significantly improving uncertainty measures in scenarios where high generalization performance is achievable. Conversely, it retains uncertainty metrics while enhancing accuracy where high generalization accuracy is less attainable. An effect size analysis further reveals that the SPFP algorithm outperforms benchmark models by large effect size and reduces computational demands through effective dimensionality reduction. The substantial effect sizes observed in most experiments underscore the algorithm's significant improvements in model performance.
