Table of Contents
Fetching ...

Bridging AI and Science: Implications from a Large-Scale Literature Analysis of AI4Science

Yutong Xie, Yijun Pan, Hua Xu, Qiaozhu Mei

TL;DR

The paper tackles the gap between AI and science by constructing a large-scale, data-driven map of the AI4Science landscape. It uses LLMs to extract detailed scientific problems and AI methods from a broad set of top science and AI publications (2014–2024), clusters them semantically, and represents their connections as a bipartite graph. Through link prediction experiments, it demonstrates that AI4Science connections can be predicted and that new, previously underexplored links emerge, highlighting opportunities to broaden AI integration in science. The work provides a public dataset, code, and tools to foster interdisciplinary collaboration and accelerate discovery through deeper AI adoption across scientific domains.

Abstract

Artificial Intelligence has proven to be a transformative tool for advancing scientific research across a wide range of disciplines. However, a significant gap still exists between AI and scientific communities, limiting the full potential of AI methods in driving broad scientific discovery. Existing efforts in identifying and bridging this gap have often relied on qualitative examination of small samples of literature, offering a limited perspective on the broader AI4Science landscape. In this work, we present a large-scale analysis of the AI4Science literature, starting by using large language models to identify scientific problems and AI methods in publications from top science and AI venues. Leveraging this new dataset, we quantitatively highlight key disparities between AI methods and scientific problems, revealing substantial opportunities for deeper AI integration across scientific disciplines. Furthermore, we explore the potential and challenges of facilitating collaboration between AI and scientific communities through the lens of link prediction. Our findings and tools aim to promote more impactful interdisciplinary collaborations and accelerate scientific discovery through deeper and broader AI integration. Our code and dataset are available at: https://github.com/charles-pyj/Bridging-AI-and-Science.

Bridging AI and Science: Implications from a Large-Scale Literature Analysis of AI4Science

TL;DR

The paper tackles the gap between AI and science by constructing a large-scale, data-driven map of the AI4Science landscape. It uses LLMs to extract detailed scientific problems and AI methods from a broad set of top science and AI publications (2014–2024), clusters them semantically, and represents their connections as a bipartite graph. Through link prediction experiments, it demonstrates that AI4Science connections can be predicted and that new, previously underexplored links emerge, highlighting opportunities to broaden AI integration in science. The work provides a public dataset, code, and tools to foster interdisciplinary collaboration and accelerate discovery through deeper AI adoption across scientific domains.

Abstract

Artificial Intelligence has proven to be a transformative tool for advancing scientific research across a wide range of disciplines. However, a significant gap still exists between AI and scientific communities, limiting the full potential of AI methods in driving broad scientific discovery. Existing efforts in identifying and bridging this gap have often relied on qualitative examination of small samples of literature, offering a limited perspective on the broader AI4Science landscape. In this work, we present a large-scale analysis of the AI4Science literature, starting by using large language models to identify scientific problems and AI methods in publications from top science and AI venues. Leveraging this new dataset, we quantitatively highlight key disparities between AI methods and scientific problems, revealing substantial opportunities for deeper AI integration across scientific disciplines. Furthermore, we explore the potential and challenges of facilitating collaboration between AI and scientific communities through the lens of link prediction. Our findings and tools aim to promote more impactful interdisciplinary collaborations and accelerate scientific discovery through deeper and broader AI integration. Our code and dataset are available at: https://github.com/charles-pyj/Bridging-AI-and-Science.

Paper Structure

This paper contains 56 sections, 4 equations, 7 figures, 19 tables.

Figures (7)

  • Figure 1: Illustration of LLM-based extraction of scientific problem $p_i$, AI method $m_i$, and AI usage $u_i$ from an example publication. The extractions are then semantically projected, clustered, and labeled to form the AI4Science landscape (Fig. \ref{['fig:landscape']}), as well as to construct the bipartite graph (Fig. \ref{['fig:bipartite']}a).
  • Figure 2: 2D projection maps of the AI4Science landscape: (a) the extracted scientific problems $\{p_i\}$ and (b) AI methods $\{m_i\}$. Each dot represents a publication, with coordinates obtained by reducing the dimensionality of the problem/method embeddings. Green dots correspond to AI4Science work, using AI methods to address scientific problems; Orange dots show papers addressing scientific problems without using AI methods; Purple dots are papers using AI to address non-scientific problems. The visualizations reveal a noticeable discrepancy in the distribution of AI4Science work (green) versus non-AI4Science work (orange and purple) in both the problem and method spaces.
  • Figure 3: Scientific problem clusters (a) and AI method clusters (b) are visualized as scatters. In both plots, the $x$-axis represents the total number of publications in each cluster, while the $y$-axis reflects the number of interdisciplinary AI4Science publications in each cluster. The black lines show the regression results on the clusters. Clusters above the line indicate regions well-explored for AI4Science. Clusters falling below the line highlight the under-explored regions, where the integration of AI and science remains limited. These areas represent potential opportunities for further interdisciplinary collaboration.
  • Figure 4: The AI-Science bipartite graph and node distributions. (a) The bipartite graph with scientific problem clusters and AI method clusters as nodes. The size of each node corresponds to its unweighted degree, representing the number of AI methods applied to a scientific problem, or the number of scientific problems utilizing an AI method. For visualization clarity, edges representing fewer than four publications and the resulting isolated noes are hidden. The distribution of links indicates imbalanced connectivity. (b-c) The degree of AI method nodes follows a log-normal distribution, and the degree distribution of scientific problem nodes is even more heavily tailed, suggesting the existence of "hubs" in linking AI and science.
  • Figure 5: Distribution of scientific problems of all publications, annotated with gpt-4o-2024-08-06 summary.
  • ...and 2 more figures