OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation

Yehua Huang; Penglei Sun; Zebin Chen; Zhenheng Tang; Xiaowen Chu

OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation

Yehua Huang, Penglei Sun, Zebin Chen, Zhenheng Tang, Xiaowen Chu

TL;DR

OmniReview tackles data scarcity and evaluation realism in reviewer recommendation by constructing a large, multi-source benchmark linked to scholar graphs. It introduces Pro-MMoE, an LLM-enhanced, multi-task framework that couples semantic profiling with a task-adaptive mixture-of-experts to optimize recall, discrimination, and ranking within a unified objective. Empirical results on the OmniReview benchmark show state-of-the-art performance on six of seven metrics, highlighting improved ground-truth reviewer retrieval, hard-negative discrimination, and precise candidate ranking. This work provides a valuable, scalable resource and a practical framework to advance realistic and transparent editorial workflows in academic peer review.

Abstract

Academic peer review remains the cornerstone of scholarly validation, yet the field faces some challenges in data and methods. From the data perspective, existing research is hindered by the scarcity of large-scale, verified benchmarks and oversimplified evaluation metrics that fail to reflect real-world editorial workflows. To bridge this gap, we present OmniReview, a comprehensive dataset constructed by integrating multi-source academic platforms encompassing comprehensive scholarly profiles through the disambiguation pipeline, yielding 202, 756 verified review records. Based on this data, we introduce a three-tier hierarchical evaluaion framework to assess recommendations from recall to precise expert identification. From the method perspective, existing embedding-based approaches suffer from the information bottleneck of semantic compression and limited interpretability. To resolve these method limitations, we propose Profiling Scholars with Multi-gate Mixture-of-Experts (Pro-MMoE), a novel framework that synergizes Large Language Models (LLMs) with Multi-task Learning. Specifically, it utilizes LLM-generated semantic profiles to preserve fine-grained expertise nuances and interpretability, while employing a Task-Adaptive MMoE architecture to dynamically balance conflicting evaluation goals. Comprehensive experiments demonstrate that Pro-MMoE achieves state-of-the-art performance across six of seven metrics, establishing a new benchmark for realistic reviewer recommendation.

OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation

TL;DR

Abstract

Paper Structure (41 sections, 4 equations, 9 figures, 6 tables, 2 algorithms)

This paper contains 41 sections, 4 equations, 9 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Reviewer Recommendation Dataset
Recommendation Methods
Dataset Construction
Definition
Entity Alignment
Data Cleaning
Publication Matching
Scholar Matching
Verification
Construction of the Discipline Taxonomy
Task Statement
Ground-truth Reviewers Retrieval
Elimination of Unqualified Candidates
...and 26 more sections

Figures (9)

Figure 1: Comparison of existing datasets and our proposed OmniReview dataset. Conf. denotes score confidence for the recommendation of reviewers.
Figure 2: Disambiguation Flowchart
Figure 3: (a) The overview of the Pro-MMoE architecture, comprising three main modules: Profiling Module, Multi-gate Mixture-of-Experts Module, and Task-specific Prediction Towers. Conf. refers to confidence and Rank. refers to ranking. (b) Detailed illustration of the review candidate embedding process. (c) Expert network structure within MoE, featuring multiple shared subnetworks. (d) Task-specific DNN towers for specific tasks.
Figure 4: Impact of the number of experts in MMoE. MMoE with 3 experts shows the optimal trade-off between recommendation quality and computational efficiency.
Figure 5: Performance trends across different training data volume ratios. The model maintains robust performance even with reduced data volumes, showing only moderate degradation in RRC, UCC, and NDCG metrics as the data volume ratio decreases from 1.0 to 0.05.
...and 4 more figures

OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation

TL;DR

Abstract

OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)