Table of Contents
Fetching ...

Investigating Industry--Academia Collaboration in Artificial Intelligence: PDF-Based Bibliometric Analysis from Leading Conferences

Kazuhiro Yamauchi, Marie Katsurai

TL;DR

The paper tackles the problem of understanding industry--academia collaboration in AI by constructing a PDF-based bibliometric analysis of AAAI and IJCAI proceedings from 2010 to 2023. It introduces a novel pipeline that extracts bibliographic data directly from PDFs using GROBID, normalizes affiliations via S2AFF with the Research Organization Registry, and classifies institutions as academic or industrial, identifying 1,919 industry--academia collaborative papers. Empirical findings show a surge in collaborations after 2017–2020, with China-led institutions (e.g., Microsoft Research Asia and Alibaba) at the forefront, while first authors are predominantly from academia and the textual content of collaborative papers is not markedly different from non-collaborative ones, as evidenced by SciBERT-based classification with a best F1 around $0.60$. The work provides a robust, scalable method to overcome metadata gaps in conference literature and yields important insights into global collaboration patterns and their potential drivers, though it acknowledges limitations and proposes avenues for methodological refinement and broader domain extension.

Abstract

This study presents a bibliometric analysis of industry--academia collaboration in artificial intelligence (AI) research, focusing on papers from two major international conferences, AAAI and IJCAI, from 2010 to 2023. Most previous studies have relied on publishers and other databases to analyze bibliographic information. However, these databases have problems, such as missing articles and omitted metadata. Therefore, we adopted a novel approach to extract bibliographic information directly from the article PDFs: we examined 20,549 articles and identified the collaborative papers through a classification process of author affiliation. The analysis explores the temporal evolution of collaboration in AI, highlighting significant changes in collaboration patterns over the past decade. In particular, this study examines the role of key academic and industrial institutions in facilitating these collaborations, focusing on emerging global trends. Additionally, a content analysis using document classification was conducted to examine the type of first author in collaborative research articles and explore the potential differences between collaborative and noncollaborative research articles. The results showed that, in terms of publication, collaborations are mainly led by academia, but their content is not significantly different from that of others. The affiliation metadata are available at https://github.com/mm-doshisha/ICADL2024.

Investigating Industry--Academia Collaboration in Artificial Intelligence: PDF-Based Bibliometric Analysis from Leading Conferences

TL;DR

The paper tackles the problem of understanding industry--academia collaboration in AI by constructing a PDF-based bibliometric analysis of AAAI and IJCAI proceedings from 2010 to 2023. It introduces a novel pipeline that extracts bibliographic data directly from PDFs using GROBID, normalizes affiliations via S2AFF with the Research Organization Registry, and classifies institutions as academic or industrial, identifying 1,919 industry--academia collaborative papers. Empirical findings show a surge in collaborations after 2017–2020, with China-led institutions (e.g., Microsoft Research Asia and Alibaba) at the forefront, while first authors are predominantly from academia and the textual content of collaborative papers is not markedly different from non-collaborative ones, as evidenced by SciBERT-based classification with a best F1 around . The work provides a robust, scalable method to overcome metadata gaps in conference literature and yields important insights into global collaboration patterns and their potential drivers, though it acknowledges limitations and proposes avenues for methodological refinement and broader domain extension.

Abstract

This study presents a bibliometric analysis of industry--academia collaboration in artificial intelligence (AI) research, focusing on papers from two major international conferences, AAAI and IJCAI, from 2010 to 2023. Most previous studies have relied on publishers and other databases to analyze bibliographic information. However, these databases have problems, such as missing articles and omitted metadata. Therefore, we adopted a novel approach to extract bibliographic information directly from the article PDFs: we examined 20,549 articles and identified the collaborative papers through a classification process of author affiliation. The analysis explores the temporal evolution of collaboration in AI, highlighting significant changes in collaboration patterns over the past decade. In particular, this study examines the role of key academic and industrial institutions in facilitating these collaborations, focusing on emerging global trends. Additionally, a content analysis using document classification was conducted to examine the type of first author in collaborative research articles and explore the potential differences between collaborative and noncollaborative research articles. The results showed that, in terms of publication, collaborations are mainly led by academia, but their content is not significantly different from that of others. The affiliation metadata are available at https://github.com/mm-doshisha/ICADL2024.

Paper Structure

This paper contains 18 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The flowchart of the classification of whether the institution is academic or industrial.
  • Figure 2: Time series changes in the number of papers and industry--academia collaborative papers for each conference.
  • Figure 3: The proportion of papers authored solely by academia, solely by industry, and through industry--academia collaborations.
  • Figure 4: Network of research institutions that have industry--academia collaborative papers, where edges are depicted between two institutions only if the number of their collaborative papers is greater than five.
  • Figure 5: Annual differences in affiliation types of first authors in terms of the number of industry--academia collaborative papers.
  • ...and 1 more figures