Table of Contents
Fetching ...

An Interdisciplinary Outlook on Large Language Models for Scientific Research

James Boyko, Joseph Cohen, Nathan Fox, Maria Han Veiga, Jennifer I-Hsiu Li, Jing Liu, Bernardo Modenesi, Andreas H. Rauch, Kenneth N. Reid, Soumi Tribedi, Anastasia Visheratina, Xin Xie

TL;DR

Large Language Models (LLMs) offer unprecedented scalability for consuming and generating scientific text but raise concerns about bias, hallucinations, privacy, reproducibility, and environmental impact. The authors provide a cross-disciplinary assessment, categorizing applications into ideation, information review, coding, and writing, and detailing domain-specific tasks across biology, chemistry, engineering, environmental science, health, materials, mathematics, and social sciences. They compare general pre-trained, fine-tuned, and domain-specific LLMs, highlighting successes (e.g., BioBERT, GatorTron, MatSciBERT, MOFormer) and the need for careful adaptation, evaluation, and responsible use. The paper argues that with disciplined governance, domain adaptation, and transparency, LLMs can accelerate discovery while serving as boundaries that guide scientific progress.

Abstract

In this paper, we describe the capabilities and constraints of Large Language Models (LLMs) within disparate academic disciplines, aiming to delineate their strengths and limitations with precision. We examine how LLMs augment scientific inquiry, offering concrete examples such as accelerating literature review by summarizing vast numbers of publications, enhancing code development through automated syntax correction, and refining the scientific writing process. Simultaneously, we articulate the challenges LLMs face, including their reliance on extensive and sometimes biased datasets, and the potential ethical dilemmas stemming from their use. Our critical discussion extends to the varying impacts of LLMs across fields, from the natural sciences, where they help model complex biological sequences, to the social sciences, where they can parse large-scale qualitative data. We conclude by offering a nuanced perspective on how LLMs can be both a boon and a boundary to scientific progress.

An Interdisciplinary Outlook on Large Language Models for Scientific Research

TL;DR

Large Language Models (LLMs) offer unprecedented scalability for consuming and generating scientific text but raise concerns about bias, hallucinations, privacy, reproducibility, and environmental impact. The authors provide a cross-disciplinary assessment, categorizing applications into ideation, information review, coding, and writing, and detailing domain-specific tasks across biology, chemistry, engineering, environmental science, health, materials, mathematics, and social sciences. They compare general pre-trained, fine-tuned, and domain-specific LLMs, highlighting successes (e.g., BioBERT, GatorTron, MatSciBERT, MOFormer) and the need for careful adaptation, evaluation, and responsible use. The paper argues that with disciplined governance, domain adaptation, and transparency, LLMs can accelerate discovery while serving as boundaries that guide scientific progress.

Abstract

In this paper, we describe the capabilities and constraints of Large Language Models (LLMs) within disparate academic disciplines, aiming to delineate their strengths and limitations with precision. We examine how LLMs augment scientific inquiry, offering concrete examples such as accelerating literature review by summarizing vast numbers of publications, enhancing code development through automated syntax correction, and refining the scientific writing process. Simultaneously, we articulate the challenges LLMs face, including their reliance on extensive and sometimes biased datasets, and the potential ethical dilemmas stemming from their use. Our critical discussion extends to the varying impacts of LLMs across fields, from the natural sciences, where they help model complex biological sequences, to the social sciences, where they can parse large-scale qualitative data. We conclude by offering a nuanced perspective on how LLMs can be both a boon and a boundary to scientific progress.
Paper Structure (25 sections, 1 figure)

This paper contains 25 sections, 1 figure.

Figures (1)

  • Figure 1: Number of Large Language Model publications. Scopus search for "large PRE/1 language PRE/1 model" within Article Title, Abstracts or Keywords. Search carried out on November 3rd 2023.