Table of Contents
Fetching ...

What Is Novel? A Knowledge-Driven Framework for Bias-Aware Literature Originality Evaluation

Abeer Mostafa, Thi Huyen Nguyen, Zahra Ahmadi

TL;DR

The paper tackles the subjectivity of research novelty assessment by proposing a literature-aware framework that learns reviewer-style novelty judgments from a large corpus of annotated peer reviews. It combines structured knowledge extraction, semantic retrieval, and a knowledge-graph–driven context with fine-tuning a large language model to produce calibrated novelty scores and grounded explanations. A large-scale benchmark of $79{,}973$ reviews enables paper-centric novelty summaries and a variety of aggregation strategies, yielding improved alignment with human judgments and better detection of idea-level overlap. The approach offers a scalable, transparent tool to support consistent novelty evaluation in high-volume AI research venues and to flag potential content overlap or plagiarism.

Abstract

Assessing research novelty is a core yet highly subjective aspect of peer review, typically based on implicit judgment and incomplete comparison to prior work. We introduce a literature-aware novelty assessment framework that explicitly learns how humans judge novelty from peer-review reports and grounds these judgments in structured comparison to existing research. Using nearly 80K novelty-annotated reviews from top-tier AI conferences, we fine-tune a large language model to capture reviewer-aligned novelty evaluation behavior. For a given manuscript, the system extracts structured representations of its ideas, methods, and claims, retrieves semantically related papers, and constructs a similarity graph that enables fine-grained, concept-level comparison to prior work. Conditioning on this structured evidence, the model produces calibrated novelty scores and human-like explanatory assessments, reducing overestimation and improving consistency relative to existing approaches.

What Is Novel? A Knowledge-Driven Framework for Bias-Aware Literature Originality Evaluation

TL;DR

The paper tackles the subjectivity of research novelty assessment by proposing a literature-aware framework that learns reviewer-style novelty judgments from a large corpus of annotated peer reviews. It combines structured knowledge extraction, semantic retrieval, and a knowledge-graph–driven context with fine-tuning a large language model to produce calibrated novelty scores and grounded explanations. A large-scale benchmark of reviews enables paper-centric novelty summaries and a variety of aggregation strategies, yielding improved alignment with human judgments and better detection of idea-level overlap. The approach offers a scalable, transparent tool to support consistent novelty evaluation in high-volume AI research venues and to flag potential content overlap or plagiarism.

Abstract

Assessing research novelty is a core yet highly subjective aspect of peer review, typically based on implicit judgment and incomplete comparison to prior work. We introduce a literature-aware novelty assessment framework that explicitly learns how humans judge novelty from peer-review reports and grounds these judgments in structured comparison to existing research. Using nearly 80K novelty-annotated reviews from top-tier AI conferences, we fine-tune a large language model to capture reviewer-aligned novelty evaluation behavior. For a given manuscript, the system extracts structured representations of its ideas, methods, and claims, retrieves semantically related papers, and constructs a similarity graph that enables fine-grained, concept-level comparison to prior work. Conditioning on this structured evidence, the model produces calibrated novelty scores and human-like explanatory assessments, reducing overestimation and improving consistency relative to existing approaches.
Paper Structure (19 sections, 1 equation, 6 figures, 2 tables)

This paper contains 19 sections, 1 equation, 6 figures, 2 tables.

Figures (6)

  • Figure 1: End-to-end Framework for Novelty Assessment. The human peer-review data is aggregated to provide a solid dataset reflecting novelty evaluations. Afterwards, semantic search and retrieval pipeline provides the model with top related papers for comparison.
  • Figure 2: Distribution of predicted novelty scores across models.
  • Figure 3: Prompt for free text evaluation using LLM as judge.
  • Figure 4: Distribution of predicted novelty scores across different models compared to ground truth. Histograms show the frequency of novelty scores assigned by various LLM-based reviewers and baselines. Red dashed lines indicate the mean score, while blue dotted lines denote the median for each model.
  • Figure 5: Prompt for human peer-reviews aggregation.
  • ...and 1 more figures