Table of Contents
Fetching ...

NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation

Penghai Zhao, Jinyu Tian, Qinghua Xing, Xin Zhang, Zheng Li, Jianjun Qian, Ming-Ming Cheng, Xiang Li

TL;DR

NAIPv2 tackles the challenge of scalable, unbiased paper quality estimation by introducing a debiased pairwise learning framework with a confidence-aware Review Tendency Signal (RTS) and by constructing NAIDv2, a large, domain-year debiased dataset. It trains on pairwise comparisons within domain-year clusters and decouples training from inference to achieve fast, linear-time pointwise scoring on deployment. Empirically, it reaches state-of-the-art performance (AUC $0.782$, $\rho=0.432$) and demonstrates strong generalization to unseen NeurIPS data, while maintaining robustness to distribution shifts. This work advances automated scholarly assessment and supports scalable, reliable literature intelligence in real-world settings.

Abstract

The ability to estimate the quality of scientific papers is central to how both humans and AI systems will advance scientific knowledge in the future. However, existing LLM-based estimation methods suffer from high inference cost, whereas the faster direct score regression approach is limited by scale inconsistencies. We present NAIPv2, a debiased and efficient framework for paper quality estimation. NAIPv2 employs pairwise learning within domain-year groups to reduce inconsistencies in reviewer ratings and introduces the Review Tendency Signal (RTS) as a probabilistic integration of reviewer scores and confidences. To support training and evaluation, we further construct NAIDv2, a large-scale dataset of 24,276 ICLR submissions enriched with metadata and detailed structured content. Trained on pairwise comparisons but enabling efficient pointwise prediction at deployment, NAIPv2 achieves state-of-the-art performance (78.2% AUC, 0.432 Spearman), while maintaining scalable, linear-time efficiency at inference. Notably, on unseen NeurIPS submissions, it further demonstrates strong generalization, with predicted scores increasing consistently across decision categories from Rejected to Oral. These findings establish NAIPv2 as a debiased and scalable framework for automated paper quality estimation, marking a step toward future scientific intelligence systems. Code and dataset are released at sway.cloud.microsoft/Pr42npP80MfPhvj8.

NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation

TL;DR

NAIPv2 tackles the challenge of scalable, unbiased paper quality estimation by introducing a debiased pairwise learning framework with a confidence-aware Review Tendency Signal (RTS) and by constructing NAIDv2, a large, domain-year debiased dataset. It trains on pairwise comparisons within domain-year clusters and decouples training from inference to achieve fast, linear-time pointwise scoring on deployment. Empirically, it reaches state-of-the-art performance (AUC , ) and demonstrates strong generalization to unseen NeurIPS data, while maintaining robustness to distribution shifts. This work advances automated scholarly assessment and supports scalable, reliable literature intelligence in real-world settings.

Abstract

The ability to estimate the quality of scientific papers is central to how both humans and AI systems will advance scientific knowledge in the future. However, existing LLM-based estimation methods suffer from high inference cost, whereas the faster direct score regression approach is limited by scale inconsistencies. We present NAIPv2, a debiased and efficient framework for paper quality estimation. NAIPv2 employs pairwise learning within domain-year groups to reduce inconsistencies in reviewer ratings and introduces the Review Tendency Signal (RTS) as a probabilistic integration of reviewer scores and confidences. To support training and evaluation, we further construct NAIDv2, a large-scale dataset of 24,276 ICLR submissions enriched with metadata and detailed structured content. Trained on pairwise comparisons but enabling efficient pointwise prediction at deployment, NAIPv2 achieves state-of-the-art performance (78.2% AUC, 0.432 Spearman), while maintaining scalable, linear-time efficiency at inference. Notably, on unseen NeurIPS submissions, it further demonstrates strong generalization, with predicted scores increasing consistently across decision categories from Rejected to Oral. These findings establish NAIPv2 as a debiased and scalable framework for automated paper quality estimation, marking a step toward future scientific intelligence systems. Code and dataset are released at sway.cloud.microsoft/Pr42npP80MfPhvj8.

Paper Structure

This paper contains 33 sections, 15 equations, 9 figures, 23 tables.

Figures (9)

  • Figure 1: Comparison of various frameworks. AutoRegressive approaches zhu-2025-deepreviewweng2025cycleresearcher rely on sequential generation, resulting in substantial inference latency; NAIPv1 zhao2025words enables fast pointwise regression but suffers from scale inconsistency; NAIPv2 leverages debiased pairwise training with confidence-aware signals and operates as an efficient pointwise regressor with linear-time complexity at inference.
  • Figure 2: Debiased data construction in NAIDv2. Titles and abstracts are embedded and hierarchically clustered into latent domains to mitigate domain bias, while raw scores with reviewer confidences are modeled via Gaussian likelihoods to calculate RTS, yielding a normalized dataset for robust debiased pairwise training.
  • Figure 3: Effect of clustering granularity.
  • Figure 4: Effect of difficulty bucketing.
  • Figure 5: Cross-venue evaluation on NeurIPS. (a) Predicted scores follow the hierarchical structure of conference decisions. (b) Distribution of rejected papers across score ranges.
  • ...and 4 more figures