Optimizing Research Portfolio For Semantic Impact
Alexander V. Belikov
TL;DR
The paper addresses biases in citation-based impact metrics by proposing Semantic Impact (SI), a graph-theoretic measure derived from evolving knowledge graphs built from 324K biomedical preprints. It introduces a TriEL-based KG pipeline, defines SI as a log-transformed subgraph-growth score, and demonstrates up to $R^2 \approx 0.69$ predictive accuracy at a 36-month horizon using semantic features. It then formulates a 0-1 knapsack-like portfolio optimization to maximize predicted SI while controlling risk, solving it with OR-Tools and showing SI-driven portfolios outperform random allocations. Overall, SI offers a complementary tool for guiding funding and publishing decisions, with potential to mitigate biases and accelerate scientific knowledge growth, while acknowledging data limitations and suggesting avenues for enhancement such as expanded data sources and advanced graph representations.
Abstract
Citation metrics are widely used to assess academic impact but suffer from social biases, including institutional prestige and journal visibility. Here we introduce rXiv Semantic Impact (XSI), a novel framework that predicts research impact by analyzing how scientific semantic graphs evolve in underlying fabric of science. Rather than counting citations, XSI tracks the evolution of research concepts in the academic knowledge graph (KG). Starting with a construction of a comprehensive KG from 324K biomedical publications (2003-2025), we demonstrate that XSI can predict a paper's future semantic impact (SI) with remarkable accuracy ($R^2$ = 0.69) three years in advance. We leverage these predictions to develop an optimization framework for research portfolio selection that systematically outperforms random allocation. We propose SI as a complementary metric to citations and present XSI as a tool to guide funding and publishing decisions, enhancing research impact while mitigating risk.
