Table of Contents
Fetching ...

Applying Bayesian Data Analysis for Causal Inference about Requirements Quality: A Controlled Experiment

Julian Frattini, Davide Fucci, Richard Torkar, Lloyd Montgomery, Michael Unterkalmsteiner, Jannik Fischbach, Daniel Mendez

TL;DR

The paper tackles the problem of empirical understanding of how NL requirements quality defects influence downstream software engineering activities. It compares frequentist and Bayesian causal analyses within a controlled crossover experiment involving 25 participants, seeded with passive voice and ambiguous pronoun defects, to quantify effects on domain modeling. The Bayesian analysis reveals nuanced, context-sensitive impacts, notably that ambiguous pronouns strongly increase wrong associations while passive voice has a more modest effect, with context factors mediating some outcomes. The work contributes a robust methodological framework for linking quality defects to downstream tasks, demonstrates the value of Bayesian causal inference in requirements quality research, and provides replication materials to advance empirical evidence in practice. The findings offer practitioners guidance on where to focus quality-improvement efforts and set the stage for broader adoption of BDA in SE research.

Abstract

It is commonly accepted that the quality of requirements specifications impacts subsequent software engineering activities. However, we still lack empirical evidence to support organizations in deciding whether their requirements are good enough or impede subsequent activities. We aim to contribute empirical evidence to the effect that requirements quality defects have on a software engineering activity that depends on this requirement. We conduct a controlled experiment in which 25 participants from industry and university generate domain models from four natural language requirements containing different quality defects. We evaluate the resulting models using both frequentist and Bayesian data analysis. Contrary to our expectations, our results show that the use of passive voice only has a minor impact on the resulting domain models. The use of ambiguous pronouns, however, shows a strong effect on various properties of the resulting domain models. Most notably, ambiguous pronouns lead to incorrect associations in domain models. Despite being equally advised against by literature and frequentist methods, the Bayesian data analysis shows that the two investigated quality defects have vastly different impacts on software engineering activities and, hence, deserve different levels of attention. Our employed method can be further utilized by researchers to improve reliable, detailed empirical evidence on requirements quality.

Applying Bayesian Data Analysis for Causal Inference about Requirements Quality: A Controlled Experiment

TL;DR

The paper tackles the problem of empirical understanding of how NL requirements quality defects influence downstream software engineering activities. It compares frequentist and Bayesian causal analyses within a controlled crossover experiment involving 25 participants, seeded with passive voice and ambiguous pronoun defects, to quantify effects on domain modeling. The Bayesian analysis reveals nuanced, context-sensitive impacts, notably that ambiguous pronouns strongly increase wrong associations while passive voice has a more modest effect, with context factors mediating some outcomes. The work contributes a robust methodological framework for linking quality defects to downstream tasks, demonstrates the value of Bayesian causal inference in requirements quality research, and provides replication materials to advance empirical evidence in practice. The findings offer practitioners guidance on where to focus quality-improvement efforts and set the stage for broader adoption of BDA in SE research.

Abstract

It is commonly accepted that the quality of requirements specifications impacts subsequent software engineering activities. However, we still lack empirical evidence to support organizations in deciding whether their requirements are good enough or impede subsequent activities. We aim to contribute empirical evidence to the effect that requirements quality defects have on a software engineering activity that depends on this requirement. We conduct a controlled experiment in which 25 participants from industry and university generate domain models from four natural language requirements containing different quality defects. We evaluate the resulting models using both frequentist and Bayesian data analysis. Contrary to our expectations, our results show that the use of passive voice only has a minor impact on the resulting domain models. The use of ambiguous pronouns, however, shows a strong effect on various properties of the resulting domain models. Most notably, ambiguous pronouns lead to incorrect associations in domain models. Despite being equally advised against by literature and frequentist methods, the Bayesian data analysis shows that the two investigated quality defects have vastly different impacts on software engineering activities and, hence, deserve different levels of attention. Our employed method can be further utilized by researchers to improve reliable, detailed empirical evidence on requirements quality.
Paper Structure (61 sections, 3 equations, 14 figures, 4 tables)

This paper contains 61 sections, 3 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Formalization of a requirements specification R2 using passive voice
  • Figure 2: Formalization of a requirements specification R3 using an ambiguous pronoun
  • Figure 3: Reduced version of the activity-based Requirements Quality Theory frattini2023requirements
  • Figure 4: Causal assumptions about the impact of passive voice
  • Figure 5: Domain modeling task example for requirement 4.
  • ...and 9 more figures