What Is Novel? A Knowledge-Driven Framework for Bias-Aware Literature Originality Evaluation

Abeer Mostafa; Thi Huyen Nguyen; Zahra Ahmadi

What Is Novel? A Knowledge-Driven Framework for Bias-Aware Literature Originality Evaluation

Abeer Mostafa, Thi Huyen Nguyen, Zahra Ahmadi

TL;DR

The paper tackles the subjectivity of research novelty assessment by proposing a literature-aware framework that learns reviewer-style novelty judgments from a large corpus of annotated peer reviews. It combines structured knowledge extraction, semantic retrieval, and a knowledge-graph–driven context with fine-tuning a large language model to produce calibrated novelty scores and grounded explanations. A large-scale benchmark of $79{,}973$ reviews enables paper-centric novelty summaries and a variety of aggregation strategies, yielding improved alignment with human judgments and better detection of idea-level overlap. The approach offers a scalable, transparent tool to support consistent novelty evaluation in high-volume AI research venues and to flag potential content overlap or plagiarism.

Abstract

Assessing research novelty is a core yet highly subjective aspect of peer review, typically based on implicit judgment and incomplete comparison to prior work. We introduce a literature-aware novelty assessment framework that explicitly learns how humans judge novelty from peer-review reports and grounds these judgments in structured comparison to existing research. Using nearly 80K novelty-annotated reviews from top-tier AI conferences, we fine-tune a large language model to capture reviewer-aligned novelty evaluation behavior. For a given manuscript, the system extracts structured representations of its ideas, methods, and claims, retrieves semantically related papers, and constructs a similarity graph that enables fine-grained, concept-level comparison to prior work. Conditioning on this structured evidence, the model produces calibrated novelty scores and human-like explanatory assessments, reducing overestimation and improving consistency relative to existing approaches.

What Is Novel? A Knowledge-Driven Framework for Bias-Aware Literature Originality Evaluation

TL;DR

reviews enables paper-centric novelty summaries and a variety of aggregation strategies, yielding improved alignment with human judgments and better detection of idea-level overlap. The approach offers a scalable, transparent tool to support consistent novelty evaluation in high-volume AI research venues and to flag potential content overlap or plagiarism.

Abstract

Paper Structure (19 sections, 1 equation, 6 figures, 2 tables)

This paper contains 19 sections, 1 equation, 6 figures, 2 tables.

Introduction and Related Work
Benchmark Construction
Methodology
Structured Knowledge Extraction:
Retrieval of Semantically Related Papers and KG Creation:
Model Fine-Tuning for Novelty Assessment:
Experiments
Case Study: Idea-level Plagiarism Detection
Conclusion
Limitations
Ethical Considerations
Further Details
Evaluation Metrics
Entailment--Contradiction (E-C) NLI Evaluation Metric:
LLM Judge:
...and 4 more sections

Figures (6)

Figure 1: End-to-end Framework for Novelty Assessment. The human peer-review data is aggregated to provide a solid dataset reflecting novelty evaluations. Afterwards, semantic search and retrieval pipeline provides the model with top related papers for comparison.
Figure 2: Distribution of predicted novelty scores across models.
Figure 3: Prompt for free text evaluation using LLM as judge.
Figure 4: Distribution of predicted novelty scores across different models compared to ground truth. Histograms show the frequency of novelty scores assigned by various LLM-based reviewers and baselines. Red dashed lines indicate the mean score, while blue dotted lines denote the median for each model.
Figure 5: Prompt for human peer-reviews aggregation.
...and 1 more figures

What Is Novel? A Knowledge-Driven Framework for Bias-Aware Literature Originality Evaluation

TL;DR

Abstract

What Is Novel? A Knowledge-Driven Framework for Bias-Aware Literature Originality Evaluation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)