Table of Contents
Fetching ...

NovAScore: A New Automated Metric for Evaluating Document Level Novelty

Lin Ai, Ziwei Gong, Harshsaiprasad Deshpande, Alexander Johnson, Emmy Phung, Ahmad Emami, Julia Hirschberg

TL;DR

This work introduces NovAScore (Novelty Evaluation in Atomicity Score), an automated metric for evaluating document-level novelty that aggregates the novelty and salience scores of atomic information, providing high interpretability and a detailed analysis of a document's novelty.

Abstract

The rapid expansion of online content has intensified the issue of information redundancy, underscoring the need for solutions that can identify genuinely new information. Despite this challenge, the research community has seen a decline in focus on novelty detection, particularly with the rise of large language models (LLMs). Additionally, previous approaches have relied heavily on human annotation, which is time-consuming, costly, and particularly challenging when annotators must compare a target document against a vast number of historical documents. In this work, we introduce NovAScore (Novelty Evaluation in Atomicity Score), an automated metric for evaluating document-level novelty. NovAScore aggregates the novelty and salience scores of atomic information, providing high interpretability and a detailed analysis of a document's novelty. With its dynamic weight adjustment scheme, NovAScore offers enhanced flexibility and an additional dimension to assess both the novelty level and the importance of information within a document. Our experiments show that NovAScore strongly correlates with human judgments of novelty, achieving a 0.626 Point-Biserial correlation on the TAP-DLND 1.0 dataset and a 0.920 Pearson correlation on an internal human-annotated dataset.

NovAScore: A New Automated Metric for Evaluating Document Level Novelty

TL;DR

This work introduces NovAScore (Novelty Evaluation in Atomicity Score), an automated metric for evaluating document-level novelty that aggregates the novelty and salience scores of atomic information, providing high interpretability and a detailed analysis of a document's novelty.

Abstract

The rapid expansion of online content has intensified the issue of information redundancy, underscoring the need for solutions that can identify genuinely new information. Despite this challenge, the research community has seen a decline in focus on novelty detection, particularly with the rise of large language models (LLMs). Additionally, previous approaches have relied heavily on human annotation, which is time-consuming, costly, and particularly challenging when annotators must compare a target document against a vast number of historical documents. In this work, we introduce NovAScore (Novelty Evaluation in Atomicity Score), an automated metric for evaluating document-level novelty. NovAScore aggregates the novelty and salience scores of atomic information, providing high interpretability and a detailed analysis of a document's novelty. With its dynamic weight adjustment scheme, NovAScore offers enhanced flexibility and an additional dimension to assess both the novelty level and the importance of information within a document. Our experiments show that NovAScore strongly correlates with human judgments of novelty, achieving a 0.626 Point-Biserial correlation on the TAP-DLND 1.0 dataset and a 0.920 Pearson correlation on an internal human-annotated dataset.
Paper Structure (46 sections, 1 equation, 4 figures, 10 tables)

This paper contains 46 sections, 1 equation, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Conceptual illustration of novelty and salient information retrieval in real-world applications.
  • Figure 2: The NovAScore framework. The target document is first decomposed into ACUs. ACU-level novelty is assessed by comparing each ACU against the ACUBank of historical documents, while salience is determined by whether the ACU is included in the document's summary. The final NovAScore is calculated by aggregating the scores of the ACUs. ACUs can be stored in the ACUBank for future analysis if necessary.
  • Figure 3: The top plot shows the weights for salient ($w_s$) and non-salient ($w_{ns}$) ACUs across different salience ratios with dynamic weight adjustment. The bottom plot compares the maximum NovAScore of 100 ACUs, with and without weight adjustment. Both plots utilize $\alpha=1$, $\beta=0.5$, and $\gamma=0.7$.
  • Figure 4: Search time for similar ACUs per ACU at varying ACUBank sizes with a single database.