NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation
Penghai Zhao, Jinyu Tian, Qinghua Xing, Xin Zhang, Zheng Li, Jianjun Qian, Ming-Ming Cheng, Xiang Li
TL;DR
NAIPv2 tackles the challenge of scalable, unbiased paper quality estimation by introducing a debiased pairwise learning framework with a confidence-aware Review Tendency Signal (RTS) and by constructing NAIDv2, a large, domain-year debiased dataset. It trains on pairwise comparisons within domain-year clusters and decouples training from inference to achieve fast, linear-time pointwise scoring on deployment. Empirically, it reaches state-of-the-art performance (AUC $0.782$, $\rho=0.432$) and demonstrates strong generalization to unseen NeurIPS data, while maintaining robustness to distribution shifts. This work advances automated scholarly assessment and supports scalable, reliable literature intelligence in real-world settings.
Abstract
The ability to estimate the quality of scientific papers is central to how both humans and AI systems will advance scientific knowledge in the future. However, existing LLM-based estimation methods suffer from high inference cost, whereas the faster direct score regression approach is limited by scale inconsistencies. We present NAIPv2, a debiased and efficient framework for paper quality estimation. NAIPv2 employs pairwise learning within domain-year groups to reduce inconsistencies in reviewer ratings and introduces the Review Tendency Signal (RTS) as a probabilistic integration of reviewer scores and confidences. To support training and evaluation, we further construct NAIDv2, a large-scale dataset of 24,276 ICLR submissions enriched with metadata and detailed structured content. Trained on pairwise comparisons but enabling efficient pointwise prediction at deployment, NAIPv2 achieves state-of-the-art performance (78.2% AUC, 0.432 Spearman), while maintaining scalable, linear-time efficiency at inference. Notably, on unseen NeurIPS submissions, it further demonstrates strong generalization, with predicted scores increasing consistently across decision categories from Rejected to Oral. These findings establish NAIPv2 as a debiased and scalable framework for automated paper quality estimation, marking a step toward future scientific intelligence systems. Code and dataset are released at sway.cloud.microsoft/Pr42npP80MfPhvj8.
