mldr.resampling: Efficient Reference Implementations of Multilabel Resampling Algorithms
Antonio J. Rivera, Miguel A. Dávila, David Elizondo, María J. del Jesus, Francisco Charte
TL;DR
The paper tackles the problem of imbalanced multilabel learning (MLL), where labels differ drastically in frequency and often co-occur within the same instances. It presents mldr.resampling, an R package delivering eleven reference implementations of multilabel resampling algorithms, optimized for speed through neighbor caching and parallelization and integrated with the mldr ecosystem. The package provides a unified resample() interface, supports multiple algorithms simultaneously, and demonstrates practical usage on common MLDs like emotions, showcasing improvements in imbalance metrics such as $MeanIR$ and SCUMBLE. By offering public, efficient, and extensible reference implementations, the work lowers barriers to applying resampling in MLL and facilitates fair comparisons across methods in practice.
Abstract
Resampling algorithms are a useful approach to deal with imbalanced learning in multilabel scenarios. These methods have to deal with singularities in the multilabel data, such as the occurrence of frequent and infrequent labels in the same instance. Implementations of these methods are sometimes limited to the pseudocode provided by their authors in a paper. This Original Software Publication presents mldr.resampling, a software package that provides reference implementations for eleven multilabel resampling methods, with an emphasis on efficiency since these algorithms are usually time-consuming.
