A Hierarchical Framework for Measuring Scientific Paper Innovation via Large Language Models
Hongming Tan, Shaoxiong Zhan, Fengwei Jia, Hai-Tao Zheng, Wai Kin Chan
TL;DR
The paper tackles the problem of measuring scientific paper innovation, which is more than novelty and requires assessing practical value. It introduces HSPIM, a training-free, hierarchical framework that converts full-text papers into section-based chunks, augments them with question-answer pairs, and uses zero-shot LLM prompts to generate novelty and confidence scores that are then aggregated into a paper-level innovation score. A two-layer prompt structure plus a genetic algorithm enables joint optimization of common and section-specific prompts, and HSPIM+ extends the framework with a p-norm aggregation across novelty, contribution, and feasibility. The authors provide theoretical analyses of unbiasedness and convergence, along with extensive experiments on peer-review datasets and domain-generalization tests, showing that HSPIM outperforms supervised baselines and other zero-shot LLM approaches and offers improved interpretability via QA rationales. The work demonstrates robust generalization, practical viability, and a principled approach to jointly quantify and explain innovation in scientific literature, with code available for replication.
Abstract
Measuring scientific paper innovation is both important and challenging. Existing content-based methods often overlook the full-paper context, fail to capture the full scope of innovation, and lack generalization. We propose HSPIM, a hierarchical and training-free framework based on large language models (LLMs). It introduces a Paper-to-Sections-to-QAs decomposition to assess innovation. We segment the text by section titles and use zero-shot LLM prompting to implement section classification, question-answering (QA) augmentation, and weighted innovation scoring. The generated QA pair focuses on section-level innovation and serves as additional context to improve the LLM scoring. For each chunk, the LLM outputs a novelty score and a confidence score. We use confidence scores as weights to aggregate novelty scores into a paper-level innovation score. To further improve performance, we propose a two-layer question structure consisting of common and section-specific questions, and apply a genetic algorithm to optimize the question-prompt combinations. Furthermore, under the fine-grained structure of innovation, we extend HSPIM to an HSPIM$^+$ that generates novelty, contribution, and feasibility scores with respective confidence scores. Comprehensive experiments on scientific conference paper datasets show that HSPIM outperforms baseline methods in effectiveness, generalization, and interpretability. Demo code is available at https://github.com/Jasaxion/HSPIM.
