Table of Contents
Fetching ...

A comparative analysis of rank aggregation methods for the partial label ranking problem

Jiayi Wang, Juan C. Alfaro, Viktor Bengs

TL;DR

This work reevaluates rank aggregation for partial label ranking by reintroducing scoring-based and probabilistic-based methods and adapting them to produce meaningful ties. Across synthetic and real-world datasets, scoring-based approaches (notably Borda and Copeland with tuned tie thresholds) consistently outperform the current state-of-the-art bucket pivot method, especially when information is incomplete. Non-parametric probabilistic methods underperform and incur higher computation, indicating limited practical value without further modeling. The study also provides practical guidance for hyperparameter selection and open-source code to support reproducibility and further research in partial label ranking.

Abstract

The label ranking problem is a supervised learning scenario in which the learner predicts a total order of the class labels for a given input instance. Recently, research has increasingly focused on the partial label ranking problem, a generalization of the label ranking problem that allows ties in the predicted orders. So far, most existing learning approaches for the partial label ranking problem rely on approximation algorithms for rank aggregation in the final prediction step. This paper explores several alternative aggregation methods for this critical step, including scoring-based and non-parametric probabilistic-based rank aggregation approaches. To enhance their suitability for the more general partial label ranking problem, the investigated methods are extended to increase the likelihood of producing ties. Experimental evaluations on standard benchmarks demonstrate that scoring-based variants consistently outperform the current state-of-the-art method in handling incomplete information. In contrast, non-parametric probabilistic-based variants fail to achieve competitive performance.

A comparative analysis of rank aggregation methods for the partial label ranking problem

TL;DR

This work reevaluates rank aggregation for partial label ranking by reintroducing scoring-based and probabilistic-based methods and adapting them to produce meaningful ties. Across synthetic and real-world datasets, scoring-based approaches (notably Borda and Copeland with tuned tie thresholds) consistently outperform the current state-of-the-art bucket pivot method, especially when information is incomplete. Non-parametric probabilistic methods underperform and incur higher computation, indicating limited practical value without further modeling. The study also provides practical guidance for hyperparameter selection and open-source code to support reproducibility and further research in partial label ranking.

Abstract

The label ranking problem is a supervised learning scenario in which the learner predicts a total order of the class labels for a given input instance. Recently, research has increasingly focused on the partial label ranking problem, a generalization of the label ranking problem that allows ties in the predicted orders. So far, most existing learning approaches for the partial label ranking problem rely on approximation algorithms for rank aggregation in the final prediction step. This paper explores several alternative aggregation methods for this critical step, including scoring-based and non-parametric probabilistic-based rank aggregation approaches. To enhance their suitability for the more general partial label ranking problem, the investigated methods are extended to increase the likelihood of producing ties. Experimental evaluations on standard benchmarks demonstrate that scoring-based variants consistently outperform the current state-of-the-art method in handling incomplete information. In contrast, non-parametric probabilistic-based variants fail to achieve competitive performance.

Paper Structure

This paper contains 20 sections, 6 equations, 5 figures, 11 tables, 7 algorithms.

Figures (5)

  • Figure 1: Example of the complete learning and inference process in the partial label ranking problem
  • Figure 2: Average $\tau_X$ score across synthetic datasets for different $\beta$ values, comparing algorithms under varying percentages of missing class labels
  • Figure 3: Average $\tau_X$ score across synthetic datasets for varying percentages of missing class labels, comparing algorithms
  • Figure 4: Average bucket count difference between the true and predicted bucket orders across algorithms and a subset of datasets, based on complete rankings.
  • Figure 5: Average bucket count difference between the true and predicted bucket orders for all aggregation algorithms across all datasets and missing label settings.