MLRS-PDS: A Meta-learning recommendation of dynamic ensemble selection pipelines

Hesam Jalalian; Rafael M. O. Cruz

MLRS-PDS: A Meta-learning recommendation of dynamic ensemble selection pipelines

Hesam Jalalian, Rafael M. O. Cruz

TL;DR

Dynamic Selection (DS) performance hinges on the chosen pool of classifiers and the DS method, motivating automated, dataset-aware pipeline design. The authors introduce MLRS, a multi-label meta-learning framework that maps dataset meta-features to preferred DS configurations, with three variants: MLRS-P, MLRS-DS, and MLRS-PDS, including a chained approach for full automation. Trained on a meta-dataset derived from $129$ meta-features and evaluated on $288$ datasets, MLRS variants outperform fixed-pool and fixed-DS baselines, with MLRS-PDS delivering the strongest gains by jointly selecting the pool and DS. The work demonstrates the practical value of meta-learning in AutoML-like DS pipeline design, enabling efficient, dataset-specific DS configurations without exhaustive search.

Abstract

Dynamic Selection (DS), where base classifiers are chosen from a classifier's pool for each new instance at test time, has shown to be highly effective in pattern recognition. However, instability and redundancy in the classifier pools can impede computational efficiency and accuracy in dynamic ensemble selection. This paper introduces a meta-learning recommendation system (MLRS) to recommend the optimal pool generation scheme for DES methods tailored to individual datasets. The system employs a meta-model built from dataset meta-features to predict the most suitable pool generation scheme and DES method for a given dataset. Through an extensive experimental study encompassing 288 datasets, we demonstrate that this meta-learning recommendation system outperforms traditional fixed pool or DES method selection strategies, highlighting the efficacy of a meta-learning approach in refining DES method selection. The source code, datasets, and supplementary results can be found in this project's GitHub repository: https://github.com/Menelau/MLRS-PDS.

MLRS-PDS: A Meta-learning recommendation of dynamic ensemble selection pipelines

TL;DR

meta-features and evaluated on

datasets, MLRS variants outperform fixed-pool and fixed-DS baselines, with MLRS-PDS delivering the strongest gains by jointly selecting the pool and DS. The work demonstrates the practical value of meta-learning in AutoML-like DS pipeline design, enabling efficient, dataset-specific DS configurations without exhaustive search.

Abstract

Paper Structure (18 sections, 4 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 4 figures, 3 tables, 1 algorithm.

Introduction
Related work
The Proposed Multi-label meta-learning recommendation (MLRS)
MLRS Training process
Meta-features
Meta-learning recommendation
Experimental setup
Pool generation schemes
DS Techniques
Datasets
Experimental setup
Meta-learner definition
Results
Scenario I: meta-learning for recommending the best pool generation scheme
Scenario II: meta-learning for recommending the best DS model
...and 3 more sections

Figures (4)

Figure 1: Overview of the meta-training process. In the first step, the meta-features, $mf$, are extracted from the training datasets to generate its representation $x'_{i}$. In step 2, the set of pools and DS methods are evaluated. Then, based on the highest accuracy, the meta-target, $y'$, is defined (step 3). In step 4, the meta-dataset, $MT$, is constructed, and then it is used to train a meta-model, $\lambda$ (Step 5)
Figure 2: The meta-learning recommendation process for the three distinct scenarios. The red arrow indicates the inputs (choices) provided by the user. In Scenario I, a pool generation scheme is recommended based on the dataset characteristics, conditional on the DS model specified by the user. Scenario II recommends a DS method based on the dataset characteristics and the pre-selected pool generation scheme. Scenario III recommends the best pair of (Pool, DS) without requiring user input. It is crucial to note that only the training set partition of the new query dataset $\mathbf{Q}$ is used for extracting meta-features, thereby preventing any data leakage from the test data.
Figure 3: Number of occurrences where each configuration attained the best result. a) Best pool generation schemes for the fixed META-DES technique. b) Best DS method for the fixed BP pool generation scheme.
Figure 4: Number of occurrences that each configuration attains the best result.

MLRS-PDS: A Meta-learning recommendation of dynamic ensemble selection pipelines

TL;DR

Abstract

MLRS-PDS: A Meta-learning recommendation of dynamic ensemble selection pipelines

Authors

TL;DR

Abstract

Table of Contents

Figures (4)