Table of Contents
Fetching ...

Generative AI and Empirical Software Engineering: A Paradigm Shift

Christoph Treude, Margaret-Anne Storey

TL;DR

This paper addresses how the emergence of LLMs and autonomous agents disrupts empirical software engineering by altering core phenomena, data modalities, and validity considerations. It argues for rethinking foundational constructs (such as 'developer' and 'artifact'), expanding data categories to include training data, prompts, and AI outputs, and adopting adaptive, longitudinal, and interdisciplinarily informed research methods to study AI-mediated practice. Key contributions include a framework for reframing research questions, methodological guidance for hybrid human–AI workflows, and an emphasis on data provenance, ethics, and reproducibility in dynamic AI contexts. The work aims to keep empirical software engineering rigorous and relevant as AI becomes an active collaborator in development and research.

Abstract

The adoption of large language models (LLMs) and autonomous agents in software engineering marks an enduring paradigm shift. These systems create new opportunities for tool design, workflow orchestration, and empirical observation, while fundamentally reshaping the roles of developers and the artifacts they produce. Although traditional empirical methods remain central to software engineering research, the rapid evolution of AI introduces new data modalities, alters causal assumptions, and challenges foundational constructs such as "developer", "artifact", and "interaction". As humans and AI agents increasingly co-create, the boundaries between social and technical actors blur, and the reproducibility of findings becomes contingent on model updates and prompt contexts. This vision paper examines how the integration of LLMs into software engineering disrupts established research paradigms. We discuss how it transforms the phenomena we study, the methods and theories we rely on, the data we analyze, and the threats to validity that arise in dynamic AI-mediated environments. Our aim is to help the empirical software engineering community adapt its questions, instruments, and validation standards to a future in which AI systems are not merely tools, but active collaborators shaping software engineering and its study.

Generative AI and Empirical Software Engineering: A Paradigm Shift

TL;DR

This paper addresses how the emergence of LLMs and autonomous agents disrupts empirical software engineering by altering core phenomena, data modalities, and validity considerations. It argues for rethinking foundational constructs (such as 'developer' and 'artifact'), expanding data categories to include training data, prompts, and AI outputs, and adopting adaptive, longitudinal, and interdisciplinarily informed research methods to study AI-mediated practice. Key contributions include a framework for reframing research questions, methodological guidance for hybrid human–AI workflows, and an emphasis on data provenance, ethics, and reproducibility in dynamic AI contexts. The work aims to keep empirical software engineering rigorous and relevant as AI becomes an active collaborator in development and research.

Abstract

The adoption of large language models (LLMs) and autonomous agents in software engineering marks an enduring paradigm shift. These systems create new opportunities for tool design, workflow orchestration, and empirical observation, while fundamentally reshaping the roles of developers and the artifacts they produce. Although traditional empirical methods remain central to software engineering research, the rapid evolution of AI introduces new data modalities, alters causal assumptions, and challenges foundational constructs such as "developer", "artifact", and "interaction". As humans and AI agents increasingly co-create, the boundaries between social and technical actors blur, and the reproducibility of findings becomes contingent on model updates and prompt contexts. This vision paper examines how the integration of LLMs into software engineering disrupts established research paradigms. We discuss how it transforms the phenomena we study, the methods and theories we rely on, the data we analyze, and the threats to validity that arise in dynamic AI-mediated environments. Our aim is to help the empirical software engineering community adapt its questions, instruments, and validation standards to a future in which AI systems are not merely tools, but active collaborators shaping software engineering and its study.

Paper Structure

This paper contains 6 sections, 1 figure.

Figures (1)

  • Figure 1: New types of data amenable to empirical software engineering research in the era of generative-AI adoption: training data, prompts, and output.