Evidence Tetris in the Pixelated World of Validity Threats
Marvin Wyrich, Sven Apel
TL;DR
This paper tackles the challenge of prioritizing threats to validity in empirical software engineering, where researchers often rely on intuition. It introduces Evidence Tetris, a three-step framework—Threat collection, Evidence Synthesis, and Evidence-Based Study Designs—to ground threat prioritization in synthesized empirical evidence. Using code comprehension studies as an illustrative example, it demonstrates how to identify frequently discussed threats, synthesize their evidence, and apply findings to study design decisions. The approach aims to shift the literature toward evidence-based validity discussions, improving study design, peer review, and interpretability of findings across software engineering.
Abstract
Valid empirical studies build confidence in scientific findings. Fortunately, it is now common for software engineering researchers to consider threats to validity when designing their studies and to discuss them as part of their publication. Yet, in complex experiments with human participants, there is often an overwhelming number of intuitively plausible threats to validity -- more than a researcher can feasibly cover. Therefore, prioritizing potential threats to validity becomes crucial. We suggest moving away from relying solely on intuition for prioritizing validity threats, and propose that evidence on the actual impact of suspected threats to validity should complement intuition.
