Accuracy Assessment of OpenAlex and Clarivate Scholar ID with an LLM-Assisted Benchmark
Renyu Zhao, Yunxin Chen
TL;DR
This study tackles the challenge of scholarly name disambiguation by comparing OpenAlex and Clarivate Scholar IDs using a search-enhanced, LLM-assisted benchmark. A rigorously annotated reference dataset is built from 900 authors per region (China, the USA, Europe) across three Q1 disciplines for 2019–2023, and ID-linked publications are filtered to Q1 WOS journals to compute precision and recall. The findings reveal region- and field-specific weaknesses in both ID systems: Clarivate suffers from under- and over-merging, particularly for Chinese and European authors, while OpenAlex exhibits substantial data quality issues and lower precision, though some fields show strong recall. The results inform the practical deployment of scholar IDs for SciSci analyses and highlight areas for data-quality and disambiguation improvements in large-scale bibliographic databases.
Abstract
In quantitative SciSci (science of science) studies, accurately identifying individual scholars is paramount for scientific data analysis. However, the variability in how names are represented-due to commonality, abbreviations, and different spelling conventions-complicates this task. While identifier systems like ORCID are being developed, many scholars remain unregistered, and numerous publications are not included. Scholarly databases such as Clarivate and OpenAlex have introduced their own ID systems as preliminary name disambiguation solutions. This study evaluates the effectiveness of these systems across different groups to determine their suitability for various application scenarios. We sampled authors from the top quartile (Q1) of Web of Science (WOS) journals based on country, discipline, and number of corresponding author papers. For each group, we selected 100 scholars and meticulously annotated all their papers using a Search-enhanced Large Language Model method. Using these annotations, we identified the corresponding IDs in OpenAlex and Clarivate, extracted all associated papers, filtered for Q1 WOS journals, and calculated precision and recall by comparing against the annotated dataset.
