Table of Contents
Fetching ...

Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach

Azam Asilian Bidgoli, Shahryar Rahnamayan

TL;DR

A novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS, to tackle the challenges of feature selection particularly as a sparse optimization problem, demonstrating its potential to identify more accurate feature subsets compared to state-of-the-art large-scale feature selection algorithms.

Abstract

Feature selection is a crucial step in machine learning, especially for high-dimensional datasets, where irrelevant and redundant features can degrade model performance and increase computational costs. This paper proposes a novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS, to tackle the challenges of feature selection particularly as a sparse optimization problem. The method includes a shrinking scheme to reduce dimensionality of the search space by eliminating irrelevant features before the main evolutionary process. This is achieved through a ranking-based filtering method that evaluates features based on their correlation with class labels and frequency in an initial, cost-effective evolutionary process. Additionally, a smart crossover scheme based on voting between parent solutions is introduced, giving higher weight to the parent with better classification accuracy. An intelligent mutation process is also designed to target features prematurely excluded from the population, ensuring they are evaluated in combination with other features. These integrated techniques allow the evolutionary process to explore the search space more efficiently and effectively, addressing the sparse and high-dimensional nature of large-scale feature selection problems. The effectiveness of the proposed algorithm is demonstrated through comprehensive experiments on 15 large-scale datasets, showcasing its potential to identify more accurate feature subsets compared to state-of-the-art large-scale feature selection algorithms. These results highlight LMSSS's capability to improve model performance and computational efficiency, setting a new benchmark in the field.

Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach

TL;DR

A novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS, to tackle the challenges of feature selection particularly as a sparse optimization problem, demonstrating its potential to identify more accurate feature subsets compared to state-of-the-art large-scale feature selection algorithms.

Abstract

Feature selection is a crucial step in machine learning, especially for high-dimensional datasets, where irrelevant and redundant features can degrade model performance and increase computational costs. This paper proposes a novel large-scale multi-objective evolutionary algorithm based on the search space shrinking, termed LMSSS, to tackle the challenges of feature selection particularly as a sparse optimization problem. The method includes a shrinking scheme to reduce dimensionality of the search space by eliminating irrelevant features before the main evolutionary process. This is achieved through a ranking-based filtering method that evaluates features based on their correlation with class labels and frequency in an initial, cost-effective evolutionary process. Additionally, a smart crossover scheme based on voting between parent solutions is introduced, giving higher weight to the parent with better classification accuracy. An intelligent mutation process is also designed to target features prematurely excluded from the population, ensuring they are evaluated in combination with other features. These integrated techniques allow the evolutionary process to explore the search space more efficiently and effectively, addressing the sparse and high-dimensional nature of large-scale feature selection problems. The effectiveness of the proposed algorithm is demonstrated through comprehensive experiments on 15 large-scale datasets, showcasing its potential to identify more accurate feature subsets compared to state-of-the-art large-scale feature selection algorithms. These results highlight LMSSS's capability to improve model performance and computational efficiency, setting a new benchmark in the field.

Paper Structure

This paper contains 21 sections, 10 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: Framework overview. The process steps declared with distinct colors include: A. detecting highly correlated features, B. lightweight evolutionary process on $n_f$ top features with highest MIC to select high frequent features from $r$ runs of evolutionary process, C. selecting NDS-based features in terms of frequency and MIC, D. evolutionary process for selecting final set of features on top NDS-based features.
  • Figure 2: NDS-based features. Each plot represents the top features on 10 first ranks based on MIC and frequency using NDS algorithm. The multi-criteria ranking has been visualized in a maximization-maximization Pareto fronts.
  • Figure 3: A sample of crossover. $P_1$ and $P_2$ are two selected parents where $P_1$ is a better subset of feature than $P_2$ in terms of classification accuracy and $pr>0.5$
  • Figure 4: Top: resultant Pareto fronts from all competitive algorithms on some sample train sets, bottom: resultant Pareto fronts from all competitive algorithms on some sample test sets.
  • Figure 5: The running time of a single run for each algorithm across different datasets. The times are measured in seconds. Each bar represents the running time of one algorithm for each dataset on the x-axis.