A Causal Inference Approach for Quantifying Research Impact
Keiichi Ochiai, Yutaka Matsuo
TL;DR
This work tackles the challenge of quantifying the impact of a technical topic on scientific fields by framing it as a causal inference problem. It introduces a four-step framework that uses Microsoft Academic Graph data and a difference-in-differences design to estimate topic-level effects across fields and cross-field citations. Case studies on deep learning and comparisons with other topics show substantial, field-specific impacts—deep learning notably boosts publications in computer vision and NLP and enhances cross-field citations, with an aggregate relative effect surpassing other topics. The approach provides a principled bibliometrics tool for policy analysis and investment decisions in research areas.
Abstract
Deep learning has had a great impact on various fields of computer science by enabling data-driven representation learning in a decade. Because science and technology policy decisions for a nation can be made on the impact of each technology, quantifying research impact is an important task. The number of citations and impact factor can be used to measure the impact for individual research. What would have happened without the research, however, is fundamentally a counterfactual phenomenon. Thus, we propose an approach based on causal inference to quantify the research impact of a specific technical topic. We leverage difference-in-difference to quantify the research impact by applying to bibliometric data. First, we identify papers of a specific technical topic using keywords or category tags from Microsoft Academic Graph, which is one of the largest academic publication dataset. Next, we build a paper citation network between each technical field. Then, we aggregate the cross-field citation count for each research field. Finally, the impact of a specific technical topic for each research field is estimated by applying difference-in-difference. Evaluation results show that deep learning significantly affects computer vision and natural language processing. Besides, deep learning significantly affects cross-field citation especially for speech recognition to computer vision and natural language processing to computer vision. Moreover, our method revealed that the impact of deep learning was 3.1 times of the impact of interpretability for ML models.
