Table of Contents
Fetching ...

Evidence Tetris in the Pixelated World of Validity Threats

Marvin Wyrich, Sven Apel

TL;DR

This paper tackles the challenge of prioritizing threats to validity in empirical software engineering, where researchers often rely on intuition. It introduces Evidence Tetris, a three-step framework—Threat collection, Evidence Synthesis, and Evidence-Based Study Designs—to ground threat prioritization in synthesized empirical evidence. Using code comprehension studies as an illustrative example, it demonstrates how to identify frequently discussed threats, synthesize their evidence, and apply findings to study design decisions. The approach aims to shift the literature toward evidence-based validity discussions, improving study design, peer review, and interpretability of findings across software engineering.

Abstract

Valid empirical studies build confidence in scientific findings. Fortunately, it is now common for software engineering researchers to consider threats to validity when designing their studies and to discuss them as part of their publication. Yet, in complex experiments with human participants, there is often an overwhelming number of intuitively plausible threats to validity -- more than a researcher can feasibly cover. Therefore, prioritizing potential threats to validity becomes crucial. We suggest moving away from relying solely on intuition for prioritizing validity threats, and propose that evidence on the actual impact of suspected threats to validity should complement intuition.

Evidence Tetris in the Pixelated World of Validity Threats

TL;DR

This paper tackles the challenge of prioritizing threats to validity in empirical software engineering, where researchers often rely on intuition. It introduces Evidence Tetris, a three-step framework—Threat collection, Evidence Synthesis, and Evidence-Based Study Designs—to ground threat prioritization in synthesized empirical evidence. Using code comprehension studies as an illustrative example, it demonstrates how to identify frequently discussed threats, synthesize their evidence, and apply findings to study design decisions. The approach aims to shift the literature toward evidence-based validity discussions, improving study design, peer review, and interpretability of findings across software engineering.

Abstract

Valid empirical studies build confidence in scientific findings. Fortunately, it is now common for software engineering researchers to consider threats to validity when designing their studies and to discuss them as part of their publication. Yet, in complex experiments with human participants, there is often an overwhelming number of intuitively plausible threats to validity -- more than a researcher can feasibly cover. Therefore, prioritizing potential threats to validity becomes crucial. We suggest moving away from relying solely on intuition for prioritizing validity threats, and propose that evidence on the actual impact of suspected threats to validity should complement intuition.
Paper Structure (10 sections)