Table of Contents
Fetching ...

Freshness and Informativity Weighted Cognitive Extent and Its Correlation with Cumulative Citation Count

Zihe Wang, Jian Wu

TL;DR

The paper tackles limitations of the original cognitive extent by introducing Freshness and Informativity Weighted Cognitive Extent (FICE), which weights unique scientific entities in paper titles by their freshness via a lifetime ratio and informativity via time-dependent document frequency. It defines the lifetime ratio with $r(e,t_0)=\frac{\sum_{t_s}^{t_0}df(e,t)}{\sum_{t_s}^{t_e}df(e,t)}$ and freshness as $1-r(e,t_0)$, and computes informativity as $w(e,t_0)=1-\frac{DF-DF_{\min}}{DF_{\max}-DF_{\min}}$ where $DF(e,t_0)=\sum_{t_s}^{t_0}df(e,t)$. Document frequencies $df(e,t)$ are modeled as a composite of Gaussian profiles and fitted with ADAM optimization to enable predictions beyond the observable period. Using ACL Anthology data, the authors show that the number of unique entities per quota grows slowly over time and that FICE has a strong positive correlation with $\log{C_5}$, supporting its potential as a predictor of topic-level citation impact. The work provides a reproducible framework with available code and highlights the value of incorporating freshness and informativity into measurements of cognitive extent.

Abstract

In this paper, we revisit cognitive extent, originally defined as the number of unique phrases in a quota. We introduce Freshness and Informative Weighted Cognitive Extent (FICE), calculated based on two novel weighting factors, the lifetime ratio and informativity of scientific entities. We model the lifetime of each scientific entity as the time-dependent document frequency, which is fit by the composition of multiple Gaussian profiles. The lifetime ratio is then calculated as the cumulative document frequency at the publication time $t_0$ divided by the cumulative document frequency over its entire lifetime. The informativity is calculated by normalizing the document frequency across all scientific entities recognized in a title. Using the ACL Anthology, we verified the trend formerly observed in several other domains that the number of unique scientific entities per quota increased gradually at a slower rate. We found that FICE exhibits a strong correlation with the average cumulative citation count within a quota. Our code is available at \href{https://github.com/ZiheHerzWang/Freshness-and-Informativity-Weighted-Cognitive-Extent}{https://github.com/ZiheHerzWang/Freshness-and-Informativity-Weighted-Cognitive-Extent}

Freshness and Informativity Weighted Cognitive Extent and Its Correlation with Cumulative Citation Count

TL;DR

The paper tackles limitations of the original cognitive extent by introducing Freshness and Informativity Weighted Cognitive Extent (FICE), which weights unique scientific entities in paper titles by their freshness via a lifetime ratio and informativity via time-dependent document frequency. It defines the lifetime ratio with and freshness as , and computes informativity as where . Document frequencies are modeled as a composite of Gaussian profiles and fitted with ADAM optimization to enable predictions beyond the observable period. Using ACL Anthology data, the authors show that the number of unique entities per quota grows slowly over time and that FICE has a strong positive correlation with , supporting its potential as a predictor of topic-level citation impact. The work provides a reproducible framework with available code and highlights the value of incorporating freshness and informativity into measurements of cognitive extent.

Abstract

In this paper, we revisit cognitive extent, originally defined as the number of unique phrases in a quota. We introduce Freshness and Informative Weighted Cognitive Extent (FICE), calculated based on two novel weighting factors, the lifetime ratio and informativity of scientific entities. We model the lifetime of each scientific entity as the time-dependent document frequency, which is fit by the composition of multiple Gaussian profiles. The lifetime ratio is then calculated as the cumulative document frequency at the publication time divided by the cumulative document frequency over its entire lifetime. The informativity is calculated by normalizing the document frequency across all scientific entities recognized in a title. Using the ACL Anthology, we verified the trend formerly observed in several other domains that the number of unique scientific entities per quota increased gradually at a slower rate. We found that FICE exhibits a strong correlation with the average cumulative citation count within a quota. Our code is available at \href{https://github.com/ZiheHerzWang/Freshness-and-Informativity-Weighted-Cognitive-Extent}{https://github.com/ZiheHerzWang/Freshness-and-Informativity-Weighted-Cognitive-Extent}

Paper Structure

This paper contains 18 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The number of papers, scientific entities (undisambiguated), and disambiguated entities in the ACL Corpus.
  • Figure 2: The document frequency chart (blue) of an entity named machine learning and a fitting with 4 Gaussian profiles.
  • Figure 3: The FICE calculated using disambiguated scientific entities (black dots) with $|Q|=125, 250, 500$. The red and blue curves are the polynomial fittings of disambiguated and undisambiguated entities, respectively. For each year, we only plot data points that represent full quotas of papers.
  • Figure 4: Average FICE calculated using undisambiguated entities per quota vs. the $\log{C_5}$. Paper titles are grouped into a bin size of $250$.