Table of Contents
Fetching ...

MOELIGA: a multi-objective evolutionary approach for feature selection with local improvement

Leandro Vignolo, Matias Gerard

Abstract

Selecting the most relevant or informative features is a key issue in actual machine learning problems. Since an exhaustive search is not feasible even for a moderate number of features, an intelligent search strategy must be employed for finding an optimal subset, which implies considering how features interact with each other in promoting class separability. Balancing feature subset size and classification accuracy constitutes a multi-objective optimization challenge. Here we propose MOELIGA, a multi-objective genetic algorithm incorporating an evolutionary local improvement strategy that evolves subordinate populations to refine feature subsets. MOELIGA employs a crowding-based fitness sharing mechanism and a sigmoid transformation to enhance diversity and guide compactness, alongside a geometry-based objective promoting classifier independence. Experimental evaluation on 14 diverse datasets demonstrates MOELIGA's ability to identify smaller feature subsets with superior or comparable classification performance relative to 11 state-of-the-art methods. These findings suggest MOELIGA effectively addresses the accuracy-dimensionality trade-off, offering a robust and adaptable approach for multi-objective feature selection in complex, high-dimensional scenarios.

MOELIGA: a multi-objective evolutionary approach for feature selection with local improvement

Abstract

Selecting the most relevant or informative features is a key issue in actual machine learning problems. Since an exhaustive search is not feasible even for a moderate number of features, an intelligent search strategy must be employed for finding an optimal subset, which implies considering how features interact with each other in promoting class separability. Balancing feature subset size and classification accuracy constitutes a multi-objective optimization challenge. Here we propose MOELIGA, a multi-objective genetic algorithm incorporating an evolutionary local improvement strategy that evolves subordinate populations to refine feature subsets. MOELIGA employs a crowding-based fitness sharing mechanism and a sigmoid transformation to enhance diversity and guide compactness, alongside a geometry-based objective promoting classifier independence. Experimental evaluation on 14 diverse datasets demonstrates MOELIGA's ability to identify smaller feature subsets with superior or comparable classification performance relative to 11 state-of-the-art methods. These findings suggest MOELIGA effectively addresses the accuracy-dimensionality trade-off, offering a robust and adaptable approach for multi-objective feature selection in complex, high-dimensional scenarios.
Paper Structure (22 sections, 14 equations, 9 figures, 4 tables, 3 algorithms)

This paper contains 22 sections, 14 equations, 9 figures, 4 tables, 3 algorithms.

Figures (9)

  • Figure 1: Approach with evolutionary local improvement for multi-objective feature selection.
  • Figure 2: Frequency distributions for different hyperparameter settings considered in screening phase 1. Upper left: number of validation tests (1 test vs. 3 tests); Upper right: number of objectives (2 objectives vs. 3 objectives); Lower left: number of subordinate populations (SP) (no SP vs. 3 SP); Lower right: replacement strategy (PR, CR and SR).
  • Figure 3: Frequency distributions for different hyperparameter settings considered in screening phase 2. Left: use of the sigmoid function to normalize feature counts in objective function II; Center: Parameter $\sigma$ of the sharing function; Right: value of parameter $\lambda$ in sigmoid function.
  • Figure 4: Comparison of the Pareto Fronts in the first and last generation for dataset Movement. Small orange dots correspond to the Pareto solutions from the first generation, while large dots represent the solutions from the last generation and are colored by the corresponding values for R1.
  • Figure 5: Comparison of the Pareto Fronts in the first and last generation for datasets GCM (left) and Mfeat (right), considering the pair of most relevant objectives (Objective 1 and Objective 2). Small orange dots correspond to the Pareto solutions from the first generation, while large dots represent the solutions from the last generation and are colored by the corresponding values for R1.
  • ...and 4 more figures