Investigating Industry--Academia Collaboration in Artificial Intelligence: PDF-Based Bibliometric Analysis from Leading Conferences
Kazuhiro Yamauchi, Marie Katsurai
TL;DR
The paper tackles the problem of understanding industry--academia collaboration in AI by constructing a PDF-based bibliometric analysis of AAAI and IJCAI proceedings from 2010 to 2023. It introduces a novel pipeline that extracts bibliographic data directly from PDFs using GROBID, normalizes affiliations via S2AFF with the Research Organization Registry, and classifies institutions as academic or industrial, identifying 1,919 industry--academia collaborative papers. Empirical findings show a surge in collaborations after 2017–2020, with China-led institutions (e.g., Microsoft Research Asia and Alibaba) at the forefront, while first authors are predominantly from academia and the textual content of collaborative papers is not markedly different from non-collaborative ones, as evidenced by SciBERT-based classification with a best F1 around $0.60$. The work provides a robust, scalable method to overcome metadata gaps in conference literature and yields important insights into global collaboration patterns and their potential drivers, though it acknowledges limitations and proposes avenues for methodological refinement and broader domain extension.
Abstract
This study presents a bibliometric analysis of industry--academia collaboration in artificial intelligence (AI) research, focusing on papers from two major international conferences, AAAI and IJCAI, from 2010 to 2023. Most previous studies have relied on publishers and other databases to analyze bibliographic information. However, these databases have problems, such as missing articles and omitted metadata. Therefore, we adopted a novel approach to extract bibliographic information directly from the article PDFs: we examined 20,549 articles and identified the collaborative papers through a classification process of author affiliation. The analysis explores the temporal evolution of collaboration in AI, highlighting significant changes in collaboration patterns over the past decade. In particular, this study examines the role of key academic and industrial institutions in facilitating these collaborations, focusing on emerging global trends. Additionally, a content analysis using document classification was conducted to examine the type of first author in collaborative research articles and explore the potential differences between collaborative and noncollaborative research articles. The results showed that, in terms of publication, collaborations are mainly led by academia, but their content is not significantly different from that of others. The affiliation metadata are available at https://github.com/mm-doshisha/ICADL2024.
