Table of Contents
Fetching ...

FSDEM: Feature Selection Dynamic Evaluation Metric

Muhammad Rajabinasab, Anton D. Lautrup, Tobias Hyrup, Arthur Zimek

TL;DR

This paper tackles the challenge of evaluating feature selection algorithms with expressive metrics by introducing FSDEM, a dynamic metric that jointly assesses performance and stability. FSDEM defines the FSDEM score as the area under the curve of a base performance measure $M(f)$ across the number of selected features, via an approximated function $g(x)$, and a stability score based on the average first derivative $S = \frac{\sum_{i=a}^{b} g'(x)}{(b-a) + 1}$. Implemented with linear approximation and the trapezoidal rule, FSDEM can be instantiated with any metric and is demonstrated on 20 datasets with multiple feature selection strategies. Results show FSDEM can reveal budget-sensitive best methods and provide an informative stability view by emphasizing the informative value of features rather than exact feature duplication.

Abstract

Expressive evaluation metrics are indispensable for informative experiments in all areas, and while several metrics are established in some areas, in others, such as feature selection, only indirect or otherwise limited evaluation metrics are found. In this paper, we propose a novel evaluation metric to address several problems of its predecessors and allow for flexible and reliable evaluation of feature selection algorithms. The proposed metric is a dynamic metric with two properties that can be used to evaluate both the performance and the stability of a feature selection algorithm. We conduct several empirical experiments to illustrate the use of the proposed metric in the successful evaluation of feature selection algorithms. We also provide a comparison and analysis to show the different aspects involved in the evaluation of the feature selection algorithms. The results indicate that the proposed metric is successful in carrying out the evaluation task for feature selection algorithms. This paper is an extended version of a paper published at SISAP 2024.

FSDEM: Feature Selection Dynamic Evaluation Metric

TL;DR

This paper tackles the challenge of evaluating feature selection algorithms with expressive metrics by introducing FSDEM, a dynamic metric that jointly assesses performance and stability. FSDEM defines the FSDEM score as the area under the curve of a base performance measure across the number of selected features, via an approximated function , and a stability score based on the average first derivative . Implemented with linear approximation and the trapezoidal rule, FSDEM can be instantiated with any metric and is demonstrated on 20 datasets with multiple feature selection strategies. Results show FSDEM can reveal budget-sensitive best methods and provide an informative stability view by emphasizing the informative value of features rather than exact feature duplication.

Abstract

Expressive evaluation metrics are indispensable for informative experiments in all areas, and while several metrics are established in some areas, in others, such as feature selection, only indirect or otherwise limited evaluation metrics are found. In this paper, we propose a novel evaluation metric to address several problems of its predecessors and allow for flexible and reliable evaluation of feature selection algorithms. The proposed metric is a dynamic metric with two properties that can be used to evaluate both the performance and the stability of a feature selection algorithm. We conduct several empirical experiments to illustrate the use of the proposed metric in the successful evaluation of feature selection algorithms. We also provide a comparison and analysis to show the different aspects involved in the evaluation of the feature selection algorithms. The results indicate that the proposed metric is successful in carrying out the evaluation task for feature selection algorithms. This paper is an extended version of a paper published at SISAP 2024.
Paper Structure (17 sections, 7 equations, 5 figures, 1 table)

This paper contains 17 sections, 7 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Example of the function approximation (left) and first order derivative (right) with all and half of the observation, based on the accuracy as the performance measure.
  • Figure 2: FSDEM score based on accuracy for two different feature selection methods and target feature number ranges.
  • Figure 3: FSDEM and stability score for the two scenarios.
  • Figure 4: FSDEM score and its associated stability for different algorithms and datasets based on accuracy and clustering accuracy. Note that the ranges are different.
  • Figure 5: Absolute difference of FSDEM score and its associated stability with all and half of the observations. Note that the ranges are different.