Table of Contents
Fetching ...

Optimizing Research Portfolio For Semantic Impact

Alexander V. Belikov

TL;DR

The paper addresses biases in citation-based impact metrics by proposing Semantic Impact (SI), a graph-theoretic measure derived from evolving knowledge graphs built from 324K biomedical preprints. It introduces a TriEL-based KG pipeline, defines SI as a log-transformed subgraph-growth score, and demonstrates up to $R^2 \approx 0.69$ predictive accuracy at a 36-month horizon using semantic features. It then formulates a 0-1 knapsack-like portfolio optimization to maximize predicted SI while controlling risk, solving it with OR-Tools and showing SI-driven portfolios outperform random allocations. Overall, SI offers a complementary tool for guiding funding and publishing decisions, with potential to mitigate biases and accelerate scientific knowledge growth, while acknowledging data limitations and suggesting avenues for enhancement such as expanded data sources and advanced graph representations.

Abstract

Citation metrics are widely used to assess academic impact but suffer from social biases, including institutional prestige and journal visibility. Here we introduce rXiv Semantic Impact (XSI), a novel framework that predicts research impact by analyzing how scientific semantic graphs evolve in underlying fabric of science. Rather than counting citations, XSI tracks the evolution of research concepts in the academic knowledge graph (KG). Starting with a construction of a comprehensive KG from 324K biomedical publications (2003-2025), we demonstrate that XSI can predict a paper's future semantic impact (SI) with remarkable accuracy ($R^2$ = 0.69) three years in advance. We leverage these predictions to develop an optimization framework for research portfolio selection that systematically outperforms random allocation. We propose SI as a complementary metric to citations and present XSI as a tool to guide funding and publishing decisions, enhancing research impact while mitigating risk.

Optimizing Research Portfolio For Semantic Impact

TL;DR

The paper addresses biases in citation-based impact metrics by proposing Semantic Impact (SI), a graph-theoretic measure derived from evolving knowledge graphs built from 324K biomedical preprints. It introduces a TriEL-based KG pipeline, defines SI as a log-transformed subgraph-growth score, and demonstrates up to predictive accuracy at a 36-month horizon using semantic features. It then formulates a 0-1 knapsack-like portfolio optimization to maximize predicted SI while controlling risk, solving it with OR-Tools and showing SI-driven portfolios outperform random allocations. Overall, SI offers a complementary tool for guiding funding and publishing decisions, with potential to mitigate biases and accelerate scientific knowledge growth, while acknowledging data limitations and suggesting avenues for enhancement such as expanded data sources and advanced graph representations.

Abstract

Citation metrics are widely used to assess academic impact but suffer from social biases, including institutional prestige and journal visibility. Here we introduce rXiv Semantic Impact (XSI), a novel framework that predicts research impact by analyzing how scientific semantic graphs evolve in underlying fabric of science. Rather than counting citations, XSI tracks the evolution of research concepts in the academic knowledge graph (KG). Starting with a construction of a comprehensive KG from 324K biomedical publications (2003-2025), we demonstrate that XSI can predict a paper's future semantic impact (SI) with remarkable accuracy ( = 0.69) three years in advance. We leverage these predictions to develop an optimization framework for research portfolio selection that systematically outperforms random allocation. We propose SI as a complementary metric to citations and present XSI as a tool to guide funding and publishing decisions, enhancing research impact while mitigating risk.

Paper Structure

This paper contains 11 sections, 3 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: The positive feedback loop in research optimization: Preprints and research proposals enter the Semantic Impact System (XSI), where they are transformed into semantic knowledge graphs. Predictive models assess their future semantic impact and associated risks. This information guides researchers in shaping their work and aids publishers and funding agencies in making data-driven decisions on publications and support. These decisions, in turn, accelerate the growth of the global academic knowledge graph.
  • Figure 2: Left: moving average of SI $J_\pi$ with a window of 180 days. Right: Spearman correlation of SI $J_\pi$ and citation counts from Openalex over a 90 day window for the total citation count as of Openalex data fetch.
  • Figure 3: Left: line plots, representing coefficient of determination $R^2$ for the validation sample as a function of time. Right: violin plots of the distributions of coefficient of determination $R^2$ for the training and validation sample (left and right halves, lighter and darker hues of the same color correspondingly). The target is transformed, both the target and the features are scaled. The training period is 36 months. The cases represent the target and different prediction horizons $\Delta = 6, 12, 18, 24$ and 36 months.
  • Figure 4: Left: SI $J_\pi$ - SI prediction error $\delta_\pi$ plot for prediction horizons $\Delta$ = 12 months (top) and 36 months (bottom). Right: Optimal portfolio performance for prediction horizons $\Delta$ = 12 months (top) and 36 months (bottom). The top 3 lines on both plots correspond to the selection of $5, 10, 20\%$ of available publications for the period, the bottom line (blue) with the band denoting the standard deviation error, corresponds to a random selection of publication available for each
  • Figure 5: Stage A: each abstract is repsented as a directed graph of entities. Stage B: the union of directed graphs of entites forms a complete domain knowledge graph. Stage C: a series of KGs where each following KG contains the previous directed graphs $G_r(t)$.
  • ...and 8 more figures