How Scientists Use Large Language Models to Program
Gabrielle O'Brien
TL;DR
The paper investigates how scientists who code engage with code-generating large language models (Code LLMs) and which interfaces—browser-based Chat vs IDE-embedded Copilot—they favor. Through a university-scale survey (n=199) and 14 in-depth interviews complemented by interaction logs, the authors find that Code LLMs function predominantly as information-retrieval aids for learning new languages and libraries, with verification largely relying on running code, eyeballing outputs, and reading generated code. The study highlights suboptimal verification practices and prevalent misconceptions about how Code LLMs work, underscoring the need for better tooling and interfaces that distinguish retrieval from generation, improve error signaling, and support rigorous code validation in scientific contexts. Limitations include self-reported data and a single-resource university sample, suggesting replication across diverse institutions to generalize the findings and inform design of robust, discipline-aware code-verification tools.
Abstract
Scientists across disciplines write code for critical activities like data collection and generation, statistical modeling, and visualization. As large language models that can generate code have become widely available, scientists may increasingly use these models during research software development. We investigate the characteristics of scientists who are early-adopters of code generating models and conduct interviews with scientists at a public, research-focused university. Through interviews and reviews of user interaction logs, we see that scientists often use code generating models as an information retrieval tool for navigating unfamiliar programming languages and libraries. We present findings about their verification strategies and discuss potential vulnerabilities that may emerge from code generation practices unknowingly influencing the parameters of scientific analyses.
