OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation
Yehua Huang, Penglei Sun, Zebin Chen, Zhenheng Tang, Xiaowen Chu
TL;DR
OmniReview tackles data scarcity and evaluation realism in reviewer recommendation by constructing a large, multi-source benchmark linked to scholar graphs. It introduces Pro-MMoE, an LLM-enhanced, multi-task framework that couples semantic profiling with a task-adaptive mixture-of-experts to optimize recall, discrimination, and ranking within a unified objective. Empirical results on the OmniReview benchmark show state-of-the-art performance on six of seven metrics, highlighting improved ground-truth reviewer retrieval, hard-negative discrimination, and precise candidate ranking. This work provides a valuable, scalable resource and a practical framework to advance realistic and transparent editorial workflows in academic peer review.
Abstract
Academic peer review remains the cornerstone of scholarly validation, yet the field faces some challenges in data and methods. From the data perspective, existing research is hindered by the scarcity of large-scale, verified benchmarks and oversimplified evaluation metrics that fail to reflect real-world editorial workflows. To bridge this gap, we present OmniReview, a comprehensive dataset constructed by integrating multi-source academic platforms encompassing comprehensive scholarly profiles through the disambiguation pipeline, yielding 202, 756 verified review records. Based on this data, we introduce a three-tier hierarchical evaluaion framework to assess recommendations from recall to precise expert identification. From the method perspective, existing embedding-based approaches suffer from the information bottleneck of semantic compression and limited interpretability. To resolve these method limitations, we propose Profiling Scholars with Multi-gate Mixture-of-Experts (Pro-MMoE), a novel framework that synergizes Large Language Models (LLMs) with Multi-task Learning. Specifically, it utilizes LLM-generated semantic profiles to preserve fine-grained expertise nuances and interpretability, while employing a Task-Adaptive MMoE architecture to dynamically balance conflicting evaluation goals. Comprehensive experiments demonstrate that Pro-MMoE achieves state-of-the-art performance across six of seven metrics, establishing a new benchmark for realistic reviewer recommendation.
