Table of Contents
Fetching ...

mldr.resampling: Efficient Reference Implementations of Multilabel Resampling Algorithms

Antonio J. Rivera, Miguel A. Dávila, David Elizondo, María J. del Jesus, Francisco Charte

TL;DR

The paper tackles the problem of imbalanced multilabel learning (MLL), where labels differ drastically in frequency and often co-occur within the same instances. It presents mldr.resampling, an R package delivering eleven reference implementations of multilabel resampling algorithms, optimized for speed through neighbor caching and parallelization and integrated with the mldr ecosystem. The package provides a unified resample() interface, supports multiple algorithms simultaneously, and demonstrates practical usage on common MLDs like emotions, showcasing improvements in imbalance metrics such as $MeanIR$ and SCUMBLE. By offering public, efficient, and extensible reference implementations, the work lowers barriers to applying resampling in MLL and facilitates fair comparisons across methods in practice.

Abstract

Resampling algorithms are a useful approach to deal with imbalanced learning in multilabel scenarios. These methods have to deal with singularities in the multilabel data, such as the occurrence of frequent and infrequent labels in the same instance. Implementations of these methods are sometimes limited to the pseudocode provided by their authors in a paper. This Original Software Publication presents mldr.resampling, a software package that provides reference implementations for eleven multilabel resampling methods, with an emphasis on efficiency since these algorithms are usually time-consuming.

mldr.resampling: Efficient Reference Implementations of Multilabel Resampling Algorithms

TL;DR

The paper tackles the problem of imbalanced multilabel learning (MLL), where labels differ drastically in frequency and often co-occur within the same instances. It presents mldr.resampling, an R package delivering eleven reference implementations of multilabel resampling algorithms, optimized for speed through neighbor caching and parallelization and integrated with the mldr ecosystem. The package provides a unified resample() interface, supports multiple algorithms simultaneously, and demonstrates practical usage on common MLDs like emotions, showcasing improvements in imbalance metrics such as and SCUMBLE. By offering public, efficient, and extensible reference implementations, the work lowers barriers to applying resampling in MLL and facilitates fair comparisons across methods in practice.

Abstract

Resampling algorithms are a useful approach to deal with imbalanced learning in multilabel scenarios. These methods have to deal with singularities in the multilabel data, such as the occurrence of frequent and infrequent labels in the same instance. Implementations of these methods are sometimes limited to the pseudocode provided by their authors in a paper. This Original Software Publication presents mldr.resampling, a software package that provides reference implementations for eleven multilabel resampling methods, with an emphasis on efficiency since these algorithms are usually time-consuming.
Paper Structure (12 sections, 3 equations, 3 figures, 1 table)

This paper contains 12 sections, 3 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Mean imbalance ratio of MLDs obtained from the Cometa dataset repository. Datasets having MeanIR below 35 or above 1000 have been excluded.
  • Figure 2: This concurrence diagram shows how all the instances of three minority labels, in the right side of the plot, always appear in instances having one or more majority labels.
  • Figure 3: Files produced by the resample() function after running four resampling algorithms over one dataset.