Table of Contents
Fetching ...

MEL: Efficient Multi-Task Evolutionary Learning for High-Dimensional Feature Selection

Xubin Wang, Haojiong Shangguan, Fengyi Huang, Shangrui Wu, Weijia Jia

TL;DR

MEL introduces a PSO-based multi-task evolutionary learning framework for high-dimensional feature selection. It splits the population into two subpopulations that learn feature importance and transfer knowledge to guide search, balancing accuracy and parsimony through a fitness function that penalizes large feature subsets. Extensive experiments across 12 high-dimensional genetic datasets and 10 large-sample datasets show MEL achieves superior or competitive accuracy while producing compact feature subsets and with favorable running times relative to a wide range of baselines. The approach demonstrates scalable, effective feature selection in ultra-high-dimensional settings and is complemented by open-source code for reproducibility and practical adoption.

Abstract

Feature selection is a crucial step in data mining to enhance model performance by reducing data dimensionality. However, the increasing dimensionality of collected data exacerbates the challenge known as the "curse of dimensionality", where computation grows exponentially with the number of dimensions. To tackle this issue, evolutionary computational (EC) approaches have gained popularity due to their simplicity and applicability. Unfortunately, the diverse designs of EC methods result in varying abilities to handle different data, often underutilizing and not sharing information effectively. In this paper, we propose a novel approach called PSO-based Multi-task Evolutionary Learning (MEL) that leverages multi-task learning to address these challenges. By incorporating information sharing between different feature selection tasks, MEL achieves enhanced learning ability and efficiency. We evaluate the effectiveness of MEL through extensive experiments on 22 high-dimensional datasets. Comparing against 24 EC approaches, our method exhibits strong competitiveness. Additionally, we have open-sourced our code on GitHub at https://github.com/wangxb96/MEL.

MEL: Efficient Multi-Task Evolutionary Learning for High-Dimensional Feature Selection

TL;DR

MEL introduces a PSO-based multi-task evolutionary learning framework for high-dimensional feature selection. It splits the population into two subpopulations that learn feature importance and transfer knowledge to guide search, balancing accuracy and parsimony through a fitness function that penalizes large feature subsets. Extensive experiments across 12 high-dimensional genetic datasets and 10 large-sample datasets show MEL achieves superior or competitive accuracy while producing compact feature subsets and with favorable running times relative to a wide range of baselines. The approach demonstrates scalable, effective feature selection in ultra-high-dimensional settings and is complemented by open-source code for reproducibility and practical adoption.

Abstract

Feature selection is a crucial step in data mining to enhance model performance by reducing data dimensionality. However, the increasing dimensionality of collected data exacerbates the challenge known as the "curse of dimensionality", where computation grows exponentially with the number of dimensions. To tackle this issue, evolutionary computational (EC) approaches have gained popularity due to their simplicity and applicability. Unfortunately, the diverse designs of EC methods result in varying abilities to handle different data, often underutilizing and not sharing information effectively. In this paper, we propose a novel approach called PSO-based Multi-task Evolutionary Learning (MEL) that leverages multi-task learning to address these challenges. By incorporating information sharing between different feature selection tasks, MEL achieves enhanced learning ability and efficiency. We evaluate the effectiveness of MEL through extensive experiments on 22 high-dimensional datasets. Comparing against 24 EC approaches, our method exhibits strong competitiveness. Additionally, we have open-sourced our code on GitHub at https://github.com/wangxb96/MEL.
Paper Structure (30 sections, 9 equations, 12 figures, 24 tables, 1 algorithm)

This paper contains 30 sections, 9 equations, 12 figures, 24 tables, 1 algorithm.

Figures (12)

  • Figure 1: A schematic diagram of the proposed MEL method. The parent population is divided into two subpopulations: $\Vec{Sub_1}$learns the feature importance during evolution, and its search is affected by $\Vec{Sub_2}$ best. $\Vec{Sub_2}$ also learns the importance of features during evolution, and searches for the optimal feature subset based on the results learned from $\Vec{Sub_1}$ and $\Vec{Sub_2}$. In particular, features with higher weights have a higher probability of being selected.
  • Figure 2: Convergence Curves of Swarm-based EC Algorithms in Terms of Accuracy
  • Figure 3: Convergence Curves of Swarm-based EC Algorithms in Terms of the Size of Feature Subset
  • Figure 4: Convergence Curves of Nature-inspired EC Algorithms in Terms of Accuracy
  • Figure 5: Convergence Curves of Nature-inspired EC Algorithms in Terms of the Size of Feature Subset
  • ...and 7 more figures