Table of Contents
Fetching ...

What is the Role of Large Language Models in the Evolution of Astronomy Research?

Morgan Fouesneau, Ivelina G. Momcheva, Urmila Chadayammuri, Mariia Demianenko, Antoine Dumont, Raphael E. Hviding, K. Angelique Kahle, Nadiia Pulatova, Bhavesh Rajpoot, Marten B. Scheuck, Rhys Seeburger, Dmitry Semenov, Jaime I. Villaseñor

TL;DR

The study investigates how large language models can support astronomy research by combining an internal three-month experimental evaluation of diverse LLM services with an institute-wide survey. It finds substantial potential in tasks such as code generation, literature summarization, and drafting, but highlights persistent issues with hallucinations, context limitations, and the need for critical human oversight. Ethics, copyright, equity, and policy considerations emerge as central to responsible use, prompting concrete recommendations for individuals, publishers, and the scientific community. The work underscores that LLMs are tools to augment rather than replace domain expertise, and their value lies in well-managed integration into rigorous scientific workflows. The practical impact is a roadmap for leveraging AI-assisted tasks to boost productivity while maintaining scientific integrity and fairness.

Abstract

ChatGPT and other state-of-the-art large language models (LLMs) are rapidly transforming multiple fields, offering powerful tools for a wide range of applications. These models, commonly trained on vast datasets, exhibit human-like text generation capabilities, making them useful for research tasks such as ideation, literature review, coding, drafting, and outreach. We conducted a study involving 13 astronomers at different career stages and research fields to explore LLM applications across diverse tasks over several months and to evaluate their performance in research-related activities. This work was accompanied by an anonymous survey assessing participants' experiences and attitudes towards LLMs. We provide a detailed analysis of the tasks attempted and the survey answers, along with specific output examples. Our findings highlight both the potential and limitations of LLMs in supporting research while also addressing general and research-specific ethical considerations. We conclude with a series of recommendations, emphasizing the need for researchers to complement LLMs with critical thinking and domain expertise, ensuring these tools serve as aids rather than substitutes for rigorous scientific inquiry.

What is the Role of Large Language Models in the Evolution of Astronomy Research?

TL;DR

The study investigates how large language models can support astronomy research by combining an internal three-month experimental evaluation of diverse LLM services with an institute-wide survey. It finds substantial potential in tasks such as code generation, literature summarization, and drafting, but highlights persistent issues with hallucinations, context limitations, and the need for critical human oversight. Ethics, copyright, equity, and policy considerations emerge as central to responsible use, prompting concrete recommendations for individuals, publishers, and the scientific community. The work underscores that LLMs are tools to augment rather than replace domain expertise, and their value lies in well-managed integration into rigorous scientific workflows. The practical impact is a roadmap for leveraging AI-assisted tasks to boost productivity while maintaining scientific integrity and fairness.

Abstract

ChatGPT and other state-of-the-art large language models (LLMs) are rapidly transforming multiple fields, offering powerful tools for a wide range of applications. These models, commonly trained on vast datasets, exhibit human-like text generation capabilities, making them useful for research tasks such as ideation, literature review, coding, drafting, and outreach. We conducted a study involving 13 astronomers at different career stages and research fields to explore LLM applications across diverse tasks over several months and to evaluate their performance in research-related activities. This work was accompanied by an anonymous survey assessing participants' experiences and attitudes towards LLMs. We provide a detailed analysis of the tasks attempted and the survey answers, along with specific output examples. Our findings highlight both the potential and limitations of LLMs in supporting research while also addressing general and research-specific ethical considerations. We conclude with a series of recommendations, emphasizing the need for researchers to complement LLMs with critical thinking and domain expertise, ensuring these tools serve as aids rather than substitutes for rigorous scientific inquiry.
Paper Structure (23 sections, 6 figures, 2 tables)

This paper contains 23 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Summary of southworth2020binarystarscheatsheet by ChatGPT-4. The chatbot identified the main properties of the stellar binaries discussed in the paper (top) and in-depth content analysis (bottom). It lists sections of the paper focused on different concepts and references page numbers, tables, and figures.
  • Figure 2: ChatGPT-4 on summarizing smith, a US decadal white paper on cloud technologies in science. The chatbot could draw on general knowledge about cloud computing from other scientific disciplines to recommend steps to advance cloud adoption. It also identified challenges not mentioned in the paper (Fig. \ref{['fig:dec4']}).
  • Figure 3: ChatGPT-4 on summarizing smith, a US decadal white paper on cloud technologies in science. The chatbot could draw on general knowledge about cloud computing from other scientific disciplines to recommend steps to advance cloud adoption (Fig. \ref{['fig:dec3']}). It also identified challenges not mentioned in the paper.
  • Figure 4: ChatGPT-4 assisting debugging process. This example shows how LLMs can help understand and debug a piece of code. Here, ChatGPT-4 provides corrections and associated explanations.
  • Figure 5: GitHub Copilot could generate accurate documentation with variable typing for an example of a Python function. LLMs can unload the burden of tedious tasks, here documentation, leading to a higher quality standard for source codes.
  • ...and 1 more figures