Table of Contents
Fetching ...

Scientific Computing with Large Language Models

Christopher Culver, Peter Hicks, Mihailo Milenkovic, Sanjif Shanmugavelu, Tobias Becker

TL;DR

The paper surveys how large language models are repurposed for scientific computing, focusing on two pathways: processing scientific text and learning specialized languages for molecular and physical processes. It outlines Transformer-based architectures, scaling trends, and inference accelerators, and reviews concrete applications across molecules, proteins, genomics, medicine, math, and physics. The findings indicate substantial speedups in property prediction, molecular design, and problem solving, while highlighting challenges in explainability, hallucinations, and regulatory approval. The authors argue that advances in domain-specific fine-tuning, retrieval-augmented generation, and hardware for high-throughput inference are essential for real-world deployment in science.

Abstract

We provide an overview of the emergence of large language models for scientific computing applications. We highlight use cases that involve natural language processing of scientific documents and specialized languages designed to describe physical systems. For the former, chatbot style applications appear in medicine, mathematics and physics and can be used iteratively with domain experts for problem solving. We also review specialized languages within molecular biology, the languages of molecules, proteins, and DNA where language models are being used to predict properties and even create novel physical systems at much faster rates than traditional computing methods.

Scientific Computing with Large Language Models

TL;DR

The paper surveys how large language models are repurposed for scientific computing, focusing on two pathways: processing scientific text and learning specialized languages for molecular and physical processes. It outlines Transformer-based architectures, scaling trends, and inference accelerators, and reviews concrete applications across molecules, proteins, genomics, medicine, math, and physics. The findings indicate substantial speedups in property prediction, molecular design, and problem solving, while highlighting challenges in explainability, hallucinations, and regulatory approval. The authors argue that advances in domain-specific fine-tuning, retrieval-augmented generation, and hardware for high-throughput inference are essential for real-world deployment in science.

Abstract

We provide an overview of the emergence of large language models for scientific computing applications. We highlight use cases that involve natural language processing of scientific documents and specialized languages designed to describe physical systems. For the former, chatbot style applications appear in medicine, mathematics and physics and can be used iteratively with domain experts for problem solving. We also review specialized languages within molecular biology, the languages of molecules, proteins, and DNA where language models are being used to predict properties and even create novel physical systems at much faster rates than traditional computing methods.
Paper Structure (10 sections)