Table of Contents
Fetching ...

An Empirical Investigation of Pre-Trained Deep Learning Model Reuse in the Scientific Process

Nicholas M. Synovic, Karolina Ryzka, Alessandra V. Vellucci Solari, Kenny Lyons, James C. Davis, George K. Thiruvathukal

Abstract

Deep learning has achieved recognition for its impact within natural sciences, however scientists are inhibited by the prohibitive technical cost and computational complexity of training project specific models from scratch. Following software engineering community guidance, natural scientists are reusing pre-trained deep learning models (PTMs) to amortize these costs. While prior works recommend PTM reuse patterns, to our knowledge, little work has been done to empirically evaluate their usage and impact within the natural sciences. We present the first empirical study of PTM reuse patterns in the natural sciences, quantifying the utilization and impact of conceptual, adaptation, and deployment reuse within the scientific process. Leveraging an automated large language model driven pipeline, we analyze 17,511 peer reviewed, open access papers to identify PTM reuse by scientific field, associated reuse patterns, and the impact of PTM integration into the scientific process from January 1st, 2000 to December 10th, 2025. Our results show that "Biochemistry, Genetics and Molecular Biology" has outpaced other natural scientific fields in PTM reuse, "adaptation" reuse is the most prevalent PTM reuse pattern identified across all natural science fields, and the "Test" stage of the scientific process has been most impacted by PTM integration. This aligns with the growing interest of leveraging computational methods to conduct high throughput, data driven scientific research. Our work characterizes and identifies current PTM reuse practices within the natural sciences, evaluates their impact on the scientific process, and establishes a foundation for future work into the implementation and broader scientific implications of PTM reuse.

An Empirical Investigation of Pre-Trained Deep Learning Model Reuse in the Scientific Process

Abstract

Deep learning has achieved recognition for its impact within natural sciences, however scientists are inhibited by the prohibitive technical cost and computational complexity of training project specific models from scratch. Following software engineering community guidance, natural scientists are reusing pre-trained deep learning models (PTMs) to amortize these costs. While prior works recommend PTM reuse patterns, to our knowledge, little work has been done to empirically evaluate their usage and impact within the natural sciences. We present the first empirical study of PTM reuse patterns in the natural sciences, quantifying the utilization and impact of conceptual, adaptation, and deployment reuse within the scientific process. Leveraging an automated large language model driven pipeline, we analyze 17,511 peer reviewed, open access papers to identify PTM reuse by scientific field, associated reuse patterns, and the impact of PTM integration into the scientific process from January 1st, 2000 to December 10th, 2025. Our results show that "Biochemistry, Genetics and Molecular Biology" has outpaced other natural scientific fields in PTM reuse, "adaptation" reuse is the most prevalent PTM reuse pattern identified across all natural science fields, and the "Test" stage of the scientific process has been most impacted by PTM integration. This aligns with the growing interest of leveraging computational methods to conduct high throughput, data driven scientific research. Our work characterizes and identifies current PTM reuse practices within the natural sciences, evaluates their impact on the scientific process, and establishes a foundation for future work into the implementation and broader scientific implications of PTM reuse.
Paper Structure (27 sections, 7 figures, 3 tables)

This paper contains 27 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Conceptual overview of PTM integration into the scientific workflow. The scientific process—observation, hypothesis formulation, deduction, testing, and evaluation—can incorporate PTMs at multiple stages to support data analysis, model development, and experimentation. This integration is enabled by a broader PTM ecosystem consisting of frameworks, fine-tuning algorithms, and model registries, which allow scientists to reuse existing models through adaptation, conceptual, and deployment reuse patterns.
  • Figure 2: Overview of the methodology used to identify and analyze peer-reviewed open-access natural science publications in mega-journals. The pipeline queries mega-journal databases, enriches retrieved papers with metadata from OpenAlex, and analyzes each document to identify deep learning (DL) usage, reused pre-trained models (PTMs), and PTM reuse patterns. Manually curated ground-truth labels are used to guide and refine the prompting strategy.
  • Figure 3: Results of the mega-journal paper filtering process. Our queries returned 17,511 papers between 2000 and 2025, of which 13,815 papers had citations. After filtering for natural science fields, 4,384 remained.
  • Figure 4: Number of papers using deep learning per year, 2017-2025, across major natural science disciplines. Panels (A)-(H) correspond to individual fields. The number of papers leveraging deep learning increases steadily across most disciplines. By 2024, the combined output of fields outside "Biochemistry, Genetics and Molecular Biology" exceeds the output within that field.
  • Figure 5: Number of papers reusing pre-trained models (PTMs) per year, 2017-2025, across major natural science disciplines. Panels (A)-(H) correspond to individual fields. PTM reuse increases steadily across most disciplines. Unlike overall deep learning usage, "Biochemistry, Genetics and Molecular Biology" remains the dominant field until 2025, when the combined output of other fields exceeds it.
  • ...and 2 more figures