A comparative analysis of rank aggregation methods for the partial label ranking problem
Jiayi Wang, Juan C. Alfaro, Viktor Bengs
TL;DR
This work reevaluates rank aggregation for partial label ranking by reintroducing scoring-based and probabilistic-based methods and adapting them to produce meaningful ties. Across synthetic and real-world datasets, scoring-based approaches (notably Borda and Copeland with tuned tie thresholds) consistently outperform the current state-of-the-art bucket pivot method, especially when information is incomplete. Non-parametric probabilistic methods underperform and incur higher computation, indicating limited practical value without further modeling. The study also provides practical guidance for hyperparameter selection and open-source code to support reproducibility and further research in partial label ranking.
Abstract
The label ranking problem is a supervised learning scenario in which the learner predicts a total order of the class labels for a given input instance. Recently, research has increasingly focused on the partial label ranking problem, a generalization of the label ranking problem that allows ties in the predicted orders. So far, most existing learning approaches for the partial label ranking problem rely on approximation algorithms for rank aggregation in the final prediction step. This paper explores several alternative aggregation methods for this critical step, including scoring-based and non-parametric probabilistic-based rank aggregation approaches. To enhance their suitability for the more general partial label ranking problem, the investigated methods are extended to increase the likelihood of producing ties. Experimental evaluations on standard benchmarks demonstrate that scoring-based variants consistently outperform the current state-of-the-art method in handling incomplete information. In contrast, non-parametric probabilistic-based variants fail to achieve competitive performance.
