Navigating Through Paper Flood: Advancing LLM-based Paper Evaluation through Domain-Aware Retrieval and Latent Reasoning
Wuqiang Zheng, Yiyan Xu, Xinyu Lin, Chongming Gao, Wenjie Wang, Fuli Feng
TL;DR
The paper tackles the challenge of identifying high-quality research amid a rapid publication rate by proposing PaperEval, an LLM-based framework that combines domain-aware retrieval of concurrent work with latent reasoning to improve automated paper evaluation. It introduces a progressive ranking optimization that uses $k$ retrieved references and $m$ reasoning steps, with a temperature-controlled softmax and ListMLE-style loss to refine relative rankings over training. PaperEval achieves state-of-the-art performance on two datasets (NAID and an ICLR-based quality dataset) and demonstrates practical impact by powering a real-world paper recommendation system with strong social-media engagement (over 8,000 subscribers and 10,000 views). The work contributes a practical, scalable approach to up-to-date, nuanced paper evaluation and suggests future directions in supervising latent reasoning and integrating multimodal paper content. Code and data are released in supplementary materials to support reproducibility.
Abstract
With the rapid and continuous increase in academic publications, identifying high-quality research has become an increasingly pressing challenge. While recent methods leveraging Large Language Models (LLMs) for automated paper evaluation have shown great promise, they are often constrained by outdated domain knowledge and limited reasoning capabilities. In this work, we present PaperEval, a novel LLM-based framework for automated paper evaluation that addresses these limitations through two key components: 1) a domain-aware paper retrieval module that retrieves relevant concurrent work to support contextualized assessments of novelty and contributions, and 2) a latent reasoning mechanism that enables deep understanding of complex motivations and methodologies, along with comprehensive comparison against concurrently related work, to support more accurate and reliable evaluation. To guide the reasoning process, we introduce a progressive ranking optimization strategy that encourages the LLM to iteratively refine its predictions with an emphasis on relative comparison. Experiments on two datasets demonstrate that PaperEval consistently outperforms existing methods in both academic impact and paper quality evaluation. In addition, we deploy PaperEval in a real-world paper recommendation system for filtering high-quality papers, which has gained strong engagement on social media -- amassing over 8,000 subscribers and attracting over 10,000 views for many filtered high-quality papers -- demonstrating the practical effectiveness of PaperEval.
