Table of Contents
Fetching ...

Towards Industrial Convergence : Understanding the evolution of scientific norms and practices in the field of AI

Antoine Houssard

TL;DR

The paper investigates whether industrial domination and frequent academia-to-industry mobility in AI drive convergence of norms and practices. It combines data from the Paper with Code platform, OpenAlex, arXiv metadata, and GitHub to compare academic, industrial, and mixed teams across four AI fields, using metrics such as topical diversity ($H(P_g) = -\sum_{i=1}^{n} p(T_i) \log p(T_i)$), topic-pairing via Uzzi's method, lexical diversity via the Zipf law $F(n)=\alpha/n$, and labor measures from DOA. The results show that pure academic work remains more diverse and simpler in code, while industrial and mixed teams align with industrial goals and achieve greater early impact across both artifacts, with convergence mainly mediated by mixed collaborations. The study highlights an asymmetrical convergence where industry shapes direction and practices, yet calls for strengthening academic AI research to preserve novelty, openness, and long-term heuristic value.

Abstract

In the field of artificial intelligence (AI) research, there seems to be a rapprochement between academics and industrial forces. The aim of this study is to assess whether and to what extent industrial domination in the field as well as the ever more frequent switch between academia and industry resulted in the adoption of industrial norms and practices by academics. Using bibliometric information and data on scientific code, we aimed to understand academic and industrial researchers' practices, the way of choosing, investing, and succeeding across multiple and concurrent artifacts. Our results show that, although both actors write papers and code, their practices and the norms guiding them differ greatly. Nevertheless, it appears that the presence of industrials in academic studies leads to practices leaning toward the industrial side, but also to greater success in both artifacts, suggesting that if convergence is, then it is passing through those mixed teams rather than through pure academic or industrial studies.

Towards Industrial Convergence : Understanding the evolution of scientific norms and practices in the field of AI

TL;DR

The paper investigates whether industrial domination and frequent academia-to-industry mobility in AI drive convergence of norms and practices. It combines data from the Paper with Code platform, OpenAlex, arXiv metadata, and GitHub to compare academic, industrial, and mixed teams across four AI fields, using metrics such as topical diversity (), topic-pairing via Uzzi's method, lexical diversity via the Zipf law , and labor measures from DOA. The results show that pure academic work remains more diverse and simpler in code, while industrial and mixed teams align with industrial goals and achieve greater early impact across both artifacts, with convergence mainly mediated by mixed collaborations. The study highlights an asymmetrical convergence where industry shapes direction and practices, yet calls for strengthening academic AI research to preserve novelty, openness, and long-term heuristic value.

Abstract

In the field of artificial intelligence (AI) research, there seems to be a rapprochement between academics and industrial forces. The aim of this study is to assess whether and to what extent industrial domination in the field as well as the ever more frequent switch between academia and industry resulted in the adoption of industrial norms and practices by academics. Using bibliometric information and data on scientific code, we aimed to understand academic and industrial researchers' practices, the way of choosing, investing, and succeeding across multiple and concurrent artifacts. Our results show that, although both actors write papers and code, their practices and the norms guiding them differ greatly. Nevertheless, it appears that the presence of industrials in academic studies leads to practices leaning toward the industrial side, but also to greater success in both artifacts, suggesting that if convergence is, then it is passing through those mixed teams rather than through pure academic or industrial studies.

Paper Structure

This paper contains 19 sections, 5 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Institution type and labels in the dataset A) Top 25 most represented industrial institution in the sample ; B) Top 25 most represented academic institution in the sample ; C) Study repartition within each group : "Company" represent studies conducted only by industrial actor, "Public" represents studies only conducted by academic or other non-profit / governmental actors and "Mixed" studies conducted with researchers from both groups.
  • Figure 2: Topical diversity in Academic and Industrial papers : A) Shannon Entropy of journal's (excluding pre-publication venues) OpenAlex topics, higher value means higher topical diversity. The Shannon entropy is computed for each AI task subgroup and across groups. B) Shannon entropy for papers topic using OpenAlex Topics. C) Topic pairing z-score in academic and industrial paper highlighting the different in topic combination. Z-score computed by comparing topics pairs occurrence against a network rewiring derived baseline using Uzzi's method.
  • Figure 3: Programming language in academic and industrial repositories : A) Frequency of file programming language type for purely academic, mixed and purely industrial repositories. B) Quartile to Quartile plot for the number of programming language within the repositories for purely academic, mixed and purely industrial repositories, each points represents the group quartile and the dashed line the quartile for the entire population. C) BiGram of programming languages within academic and industrial (at least one industrial contributor) repositories. Each bar represents the probability (incidence \ref{['eq2']}) of encountering the pair within a repository of the group D) TriGram of programming languages within academic and industrial repositories.
  • Figure 4: Venue choice for academic and industrial papers : A) Density distribution function for the academic specific, industrial specific and common venues (excluding pre-publication venues). The inset shows the reparation of public and private articles within those categories. B) Histogram of time to publication for academic and industrial (at least one industrial authors). The time to publication represents the time between the first upload on arXiv and the publication of the articles. The dashed horizontal line represent a truncated normal fit and the dashed vertical line the median of the distributions. C) Bar plot of the publication status for academic, industrials or mixed teams.
  • Figure 5: Repository presentation metrics: A) The figure shows the frequency/probability of encountering different elements within academic, mixed and industrial repositories. All repositories are considered except for the "install" variable which is filtered for repositories utilizing Python B) Zipf law of the top repositories readme files (top 100 for each group sorted by number of commits) for academics, mixed and industrial repositories. The figures uses the word frequency across repositories readme. Inset displays the same metric with repositories filtered by number of stars (top 100). The $\alpha$ parameter relates to the lexical diversity, lower alpha indicates higher diversity.
  • ...and 9 more figures