Table of Contents
Fetching ...

MLRS-PDS: A Meta-learning recommendation of dynamic ensemble selection pipelines

Hesam Jalalian, Rafael M. O. Cruz

TL;DR

Dynamic Selection (DS) performance hinges on the chosen pool of classifiers and the DS method, motivating automated, dataset-aware pipeline design. The authors introduce MLRS, a multi-label meta-learning framework that maps dataset meta-features to preferred DS configurations, with three variants: MLRS-P, MLRS-DS, and MLRS-PDS, including a chained approach for full automation. Trained on a meta-dataset derived from $129$ meta-features and evaluated on $288$ datasets, MLRS variants outperform fixed-pool and fixed-DS baselines, with MLRS-PDS delivering the strongest gains by jointly selecting the pool and DS. The work demonstrates the practical value of meta-learning in AutoML-like DS pipeline design, enabling efficient, dataset-specific DS configurations without exhaustive search.

Abstract

Dynamic Selection (DS), where base classifiers are chosen from a classifier's pool for each new instance at test time, has shown to be highly effective in pattern recognition. However, instability and redundancy in the classifier pools can impede computational efficiency and accuracy in dynamic ensemble selection. This paper introduces a meta-learning recommendation system (MLRS) to recommend the optimal pool generation scheme for DES methods tailored to individual datasets. The system employs a meta-model built from dataset meta-features to predict the most suitable pool generation scheme and DES method for a given dataset. Through an extensive experimental study encompassing 288 datasets, we demonstrate that this meta-learning recommendation system outperforms traditional fixed pool or DES method selection strategies, highlighting the efficacy of a meta-learning approach in refining DES method selection. The source code, datasets, and supplementary results can be found in this project's GitHub repository: https://github.com/Menelau/MLRS-PDS.

MLRS-PDS: A Meta-learning recommendation of dynamic ensemble selection pipelines

TL;DR

Dynamic Selection (DS) performance hinges on the chosen pool of classifiers and the DS method, motivating automated, dataset-aware pipeline design. The authors introduce MLRS, a multi-label meta-learning framework that maps dataset meta-features to preferred DS configurations, with three variants: MLRS-P, MLRS-DS, and MLRS-PDS, including a chained approach for full automation. Trained on a meta-dataset derived from meta-features and evaluated on datasets, MLRS variants outperform fixed-pool and fixed-DS baselines, with MLRS-PDS delivering the strongest gains by jointly selecting the pool and DS. The work demonstrates the practical value of meta-learning in AutoML-like DS pipeline design, enabling efficient, dataset-specific DS configurations without exhaustive search.

Abstract

Dynamic Selection (DS), where base classifiers are chosen from a classifier's pool for each new instance at test time, has shown to be highly effective in pattern recognition. However, instability and redundancy in the classifier pools can impede computational efficiency and accuracy in dynamic ensemble selection. This paper introduces a meta-learning recommendation system (MLRS) to recommend the optimal pool generation scheme for DES methods tailored to individual datasets. The system employs a meta-model built from dataset meta-features to predict the most suitable pool generation scheme and DES method for a given dataset. Through an extensive experimental study encompassing 288 datasets, we demonstrate that this meta-learning recommendation system outperforms traditional fixed pool or DES method selection strategies, highlighting the efficacy of a meta-learning approach in refining DES method selection. The source code, datasets, and supplementary results can be found in this project's GitHub repository: https://github.com/Menelau/MLRS-PDS.
Paper Structure (18 sections, 4 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of the meta-training process. In the first step, the meta-features, $mf$, are extracted from the training datasets to generate its representation $x'_{i}$. In step 2, the set of pools and DS methods are evaluated. Then, based on the highest accuracy, the meta-target, $y'$, is defined (step 3). In step 4, the meta-dataset, $MT$, is constructed, and then it is used to train a meta-model, $\lambda$ (Step 5)
  • Figure 2: The meta-learning recommendation process for the three distinct scenarios. The red arrow indicates the inputs (choices) provided by the user. In Scenario I, a pool generation scheme is recommended based on the dataset characteristics, conditional on the DS model specified by the user. Scenario II recommends a DS method based on the dataset characteristics and the pre-selected pool generation scheme. Scenario III recommends the best pair of (Pool, DS) without requiring user input. It is crucial to note that only the training set partition of the new query dataset $\mathbf{Q}$ is used for extracting meta-features, thereby preventing any data leakage from the test data.
  • Figure 3: Number of occurrences where each configuration attained the best result. a) Best pool generation schemes for the fixed META-DES technique. b) Best DS method for the fixed BP pool generation scheme.
  • Figure 4: Number of occurrences that each configuration attains the best result.