Freshness and Informativity Weighted Cognitive Extent and Its Correlation with Cumulative Citation Count
Zihe Wang, Jian Wu
TL;DR
The paper tackles limitations of the original cognitive extent by introducing Freshness and Informativity Weighted Cognitive Extent (FICE), which weights unique scientific entities in paper titles by their freshness via a lifetime ratio and informativity via time-dependent document frequency. It defines the lifetime ratio with $r(e,t_0)=\frac{\sum_{t_s}^{t_0}df(e,t)}{\sum_{t_s}^{t_e}df(e,t)}$ and freshness as $1-r(e,t_0)$, and computes informativity as $w(e,t_0)=1-\frac{DF-DF_{\min}}{DF_{\max}-DF_{\min}}$ where $DF(e,t_0)=\sum_{t_s}^{t_0}df(e,t)$. Document frequencies $df(e,t)$ are modeled as a composite of Gaussian profiles and fitted with ADAM optimization to enable predictions beyond the observable period. Using ACL Anthology data, the authors show that the number of unique entities per quota grows slowly over time and that FICE has a strong positive correlation with $\log{C_5}$, supporting its potential as a predictor of topic-level citation impact. The work provides a reproducible framework with available code and highlights the value of incorporating freshness and informativity into measurements of cognitive extent.
Abstract
In this paper, we revisit cognitive extent, originally defined as the number of unique phrases in a quota. We introduce Freshness and Informative Weighted Cognitive Extent (FICE), calculated based on two novel weighting factors, the lifetime ratio and informativity of scientific entities. We model the lifetime of each scientific entity as the time-dependent document frequency, which is fit by the composition of multiple Gaussian profiles. The lifetime ratio is then calculated as the cumulative document frequency at the publication time $t_0$ divided by the cumulative document frequency over its entire lifetime. The informativity is calculated by normalizing the document frequency across all scientific entities recognized in a title. Using the ACL Anthology, we verified the trend formerly observed in several other domains that the number of unique scientific entities per quota increased gradually at a slower rate. We found that FICE exhibits a strong correlation with the average cumulative citation count within a quota. Our code is available at \href{https://github.com/ZiheHerzWang/Freshness-and-Informativity-Weighted-Cognitive-Extent}{https://github.com/ZiheHerzWang/Freshness-and-Informativity-Weighted-Cognitive-Extent}
