Table of Contents
Fetching ...

The Rapid Growth of AI Foundation Model Usage in Science

Ana Trišović, Alex Fogelson, Janakan Sivaloganathan, Neil Thompson

TL;DR

This study delivers the first large-scale, paper-level analysis of foundation-model adoption in science, revealing near-exponential growth since 2015 and field-specific disparities. It builds the FutureTech AI in Science Database and uses a multi-faceted pipeline to identify, classify, and disambiguate model references in papers, with Bayesian corrections for false positives. The findings show vision models still predominate in adoption, open-weight models dominate, and scientists increasingly customize models, with larger models associated with higher-impact journals and more citations. The work highlights openness, access, and scale as key levers for AI-enabled science and policy.

Abstract

We present the first large-scale analysis of AI foundation model usage in science - not just citations or keywords. We find that adoption has grown rapidly, at nearly-exponential rates, with the highest uptake in Linguistics, Computer Science, and Engineering. Vision models are the most used foundation models in science, although language models' share is growing. Open-weight models dominate. As AI builders increase the parameter counts of their models, scientists have followed suit but at a much slower rate: in 2013, the median foundation model built was 7.7x larger than the median one adopted in science, by 2024 this had jumped to 26x. We also present suggestive evidence that scientists' use of these smaller models may be limiting them from getting the full benefits of AI-enabled science, as papers that use larger models appear in higher-impact journals and accrue more citations.

The Rapid Growth of AI Foundation Model Usage in Science

TL;DR

This study delivers the first large-scale, paper-level analysis of foundation-model adoption in science, revealing near-exponential growth since 2015 and field-specific disparities. It builds the FutureTech AI in Science Database and uses a multi-faceted pipeline to identify, classify, and disambiguate model references in papers, with Bayesian corrections for false positives. The findings show vision models still predominate in adoption, open-weight models dominate, and scientists increasingly customize models, with larger models associated with higher-impact journals and more citations. The work highlights openness, access, and scale as key levers for AI-enabled science and policy.

Abstract

We present the first large-scale analysis of AI foundation model usage in science - not just citations or keywords. We find that adoption has grown rapidly, at nearly-exponential rates, with the highest uptake in Linguistics, Computer Science, and Engineering. Vision models are the most used foundation models in science, although language models' share is growing. Open-weight models dominate. As AI builders increase the parameter counts of their models, scientists have followed suit but at a much slower rate: in 2013, the median foundation model built was 7.7x larger than the median one adopted in science, by 2024 this had jumped to 26x. We also present suggestive evidence that scientists' use of these smaller models may be limiting them from getting the full benefits of AI-enabled science, as papers that use larger models appear in higher-impact journals and accrue more citations.

Paper Structure

This paper contains 18 sections, 8 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Trends in scientific engagement with foundation models: (A) Share of publications citing, using, customizing, or releasing foundational models. Publications with multiple types are categorized by their most technically demanding category (release, then customization, then use, then citation); (B) Share of publications adopting foundation models by field, where adoption is defined as use or customization. Grey lines represent the other academic fields.
  • Figure 2: Foundation model adoption across scientific fields: (A) Share of adoptions by parameter count; (B) Share of models being built by parameter count; (C) Total adoption by model size (# parameters). Most-adopted models are listed in the bar; (D) Trends in mean model size for models built and adopted, with 25th-75th percentiles shaded; (E) Average model size adopted per year, by scientific field.
  • Figure 3: The scholarly impacts of foundation model adoptions: (A) average journal impact factor by adopted model size; (B) median citation counts by adopted model size; (C) average number of authors by adopted model size.
  • Figure 4: Share of foundation models by (A) adopted modality by field, (B) adopted modality over time, (C) built modality over time, and (D) openness of adopted models over time. In A-C, multiclass models are given equal weight per modality.
  • Figure S1: gpt-4.1-mini classifier confusion matrix, column-weighted by observed distribution.
  • ...and 1 more figures