Table of Contents
Fetching ...

Rank Aggregation in Crowdsourcing for Listwise Annotations

Wenshui Luo, Haoyu Liu, Yongliang Ding, Tao Zhou, Sheng wan, Runze Wu, Minmin Lin, Cong Zhang, Changjie Fan, Chen Gong

TL;DR

To the knowledge, LAC is the first work to directly deal with the full rank aggregation problem in listwise crowdsourcing, and simultaneously infer the difficulty of problems, the ability of annotators, and the ground-truth ranks in an unsupervised way.

Abstract

Rank aggregation through crowdsourcing has recently gained significant attention, particularly in the context of listwise ranking annotations. However, existing methods primarily focus on a single problem and partial ranks, while the aggregation of listwise full ranks across numerous problems remains largely unexplored. This scenario finds relevance in various applications, such as model quality assessment and reinforcement learning with human feedback. In light of practical needs, we propose LAC, a Listwise rank Aggregation method in Crowdsourcing, where the global position information is carefully measured and included. In our design, an especially proposed annotation quality indicator is employed to measure the discrepancy between the annotated rank and the true rank. We also take the difficulty of the ranking problem itself into consideration, as it directly impacts the performance of annotators and consequently influences the final results. To our knowledge, LAC is the first work to directly deal with the full rank aggregation problem in listwise crowdsourcing, and simultaneously infer the difficulty of problems, the ability of annotators, and the ground-truth ranks in an unsupervised way. To evaluate our method, we collect a real-world business-oriented dataset for paragraph ranking. Experimental results on both synthetic and real-world benchmark datasets demonstrate the effectiveness of our proposed LAC method.

Rank Aggregation in Crowdsourcing for Listwise Annotations

TL;DR

To the knowledge, LAC is the first work to directly deal with the full rank aggregation problem in listwise crowdsourcing, and simultaneously infer the difficulty of problems, the ability of annotators, and the ground-truth ranks in an unsupervised way.

Abstract

Rank aggregation through crowdsourcing has recently gained significant attention, particularly in the context of listwise ranking annotations. However, existing methods primarily focus on a single problem and partial ranks, while the aggregation of listwise full ranks across numerous problems remains largely unexplored. This scenario finds relevance in various applications, such as model quality assessment and reinforcement learning with human feedback. In light of practical needs, we propose LAC, a Listwise rank Aggregation method in Crowdsourcing, where the global position information is carefully measured and included. In our design, an especially proposed annotation quality indicator is employed to measure the discrepancy between the annotated rank and the true rank. We also take the difficulty of the ranking problem itself into consideration, as it directly impacts the performance of annotators and consequently influences the final results. To our knowledge, LAC is the first work to directly deal with the full rank aggregation problem in listwise crowdsourcing, and simultaneously infer the difficulty of problems, the ability of annotators, and the ground-truth ranks in an unsupervised way. To evaluate our method, we collect a real-world business-oriented dataset for paragraph ranking. Experimental results on both synthetic and real-world benchmark datasets demonstrate the effectiveness of our proposed LAC method.

Paper Structure

This paper contains 18 sections, 25 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison between different rank aggregation paradigms. In this figure, $A\sim Z$ or $A_i\sim D_i$ are items to be ranked, and their ground-truth ranks are determined according to the alphabetical order. (a): In previous rank aggregation settings, the ground-truth rank of a single long sequence should be determined. In pointwise methods, only ratings for the assigned items are provided by each annotator, while in pairwise methods, the relative rankings within pairs of items are provided. For the listwise partial rank aggregation task, each annotator receives a subset of items and provides the corresponding ranks over them. (b): In the context of listwise full rank aggregation, there are multiple short sequences, and the items within each sequence remain to be ranked. During the annotation process, each annotator is assigned several sequences and is required to annotate complete ranks for the items in each sequence.
  • Figure 2: Probabilistic graphical model representation of the proposed LAC method.
  • Figure 3: Overall estimation error of LAC with various $e$.
  • Figure 4: Comparisons between the ground-truth ability matrices of annotators (left) and the estimated ones (right), which illustrate that our estimated confusion matrices are similar to the corresponding ground-truth ones in most cases.
  • Figure 5: Test accuracy curves on ParaRank with different numbers of examples and different numbers of annotators for all the compared methods.