Who's actually being Studied? A Call for Population Analysis in Software Engineering Research
Jefferson Seide Molléri
TL;DR
The paper tackles the problem of vague or missing population definitions in empirical software engineering, arguing that valid generalization hinges on explicit target populations and population frames. It distinguishes generalizability from transferability and analyzes challenges across individuals, organizations, projects, and artifacts, highlighting data limitations, definitional ambiguities, and biases. The authors propose a pragmatic set of guidelines—defining boundaries, leveraging diverse data sources, cross-validating population estimates, and employing advanced sampling techniques (e.g., snowballing, stratification)—to improve the rigor and external validity of ESE studies. By formalizing population analysis, the work aims to enhance the applicability and transferability of empirical findings across contexts in software engineering.
Abstract
Population analysis is crucial for ensuring that empirical software engineering (ESE) research is representative and its findings are valid. Yet, there is a persistent gap between sampling processes and the holistic examination of populations, which this position paper addresses. We explore the challenges ranging from analysing populations of individual software engineers to organizations and projects. We discuss the interplay between generalizability and transferability and advocate for appropriate population frames. We also present a compelling case for improved population analysis aiming to enhance the empirical rigor and external validity of ESE research.
