Detecting signal from science:The structure of research communities and prior knowledge improves prediction of genetic regulatory experiments
Alexander V. Belikov, Andrey Rzhetsky, James Evans
TL;DR
This work tackles the challenge of navigating prior knowledge and reproducibility in biomedical literature by introducing a Bayesian framework that integrates claims from GeneWays and Literome with large-scale LINCS L1000 data. It partitions gene regulatory interactions into neutral, negative, and positive classes using data-driven thresholds, and builds a rich set of interaction- and batch-level features to predict neutrality, positivity, and claim correctness. The study demonstrates that scientifically focused yet institutionally diverse activity enhances replicability and shows how policy choices that broaden research communities can improve overall predictive power and robustness. Collectively, the approach provides a scalable, data-driven way to decode bias, estimate replicability, and guide science funding toward more reliable discoveries.
Abstract
The explosive growth of scientists, scientific journals, articles and findings in recent years exponentially increases the difficulty scientists face in navigating prior knowledge. This challenge is exacerbated by uncertainty about the reproducibility of published findings. The availability of massive digital archives, machine reading and extraction tools on the one hand, and automated high-throughput experiments on the other, allow us to evaluate these challenges at scale and identify novel opportunities for accelerating scientific advance. Here we demonstrate a Bayesian calculus that enables the positive prediction of robust, replicable scientific claims with findings automatically extracted from published literature on gene interactions. We matched these findings, filtered by science, with unfiltered gene interactions measured by the massive LINCS L1000 high-throughput experiment to identify and counteract sources of bias. Our calculus is built on easily extracted publication meta-data regarding the position of a scientific claim within the web of prior knowledge, and its breadth of support across institutions, authors and communities, revealing that scientifically focused but socially and institutionally independent research activity is most likely to replicate. These findings recommend policies that go against the common practice of channeling biomedical research funding into centralized research consortia and institutes rather than dispersing it more broadly. Our results demonstrate that robust scientific findings hinge upon a delicate balance of shared focus and independence, and that this complex pattern can be computationally exploited to decode bias and predict the replicability of published findings. These insights provide guidance for scientists navigating the research literature and for science funders seeking to improve it.
