Table of Contents
Fetching ...

Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA

Alexander Berndt, Thomas Bach, Sebastian Baltes

TL;DR

Split long-running tests into smaller tests with a narrower scope can effectively decrease the negative effects of test flakiness in complex testing environments and enables parallelization of test executions and reduces the cost of re-executions after flaky failures.

Abstract

Background: Test flakiness is a major problem in the software industry. Flaky tests fail seemingly at random without changes to the code and thus impede continuous integration (CI). Some researchers argue that all tests can be considered flaky and that tests only differ in their frequency of flaky failures. Aims: With the goal of developing mitigation strategies to reduce the negative impact of test flakiness, we study characteristics of tests and the test environment that potentially impact test flakiness. Method: We construct two datasets based on SAP HANA's test results over a 12-week period: one based on production data, the other based on targeted test executions from a dedicated flakiness experiment. We conduct correlation analysis for test and test environment characteristics with respect to their influence on the frequency of flaky test failures. Results: In our study, the average test execution time had the strongest positive correlation with the test flakiness rate (r = 0.79), which confirms previous studies. Potential reasons for higher flakiness include the larger test scope of long-running tests or test executions on a slower test infrastructure. Interestingly, the load on the testing infrastructure was not correlated with test flakiness. The relationship between test flakiness and required resources for test execution is inconclusive. Conclusions: Based on our findings, we conclude that splitting long-running tests can be an important measure for practitioners to cope with test flakiness, as it enables parallelization of test executions and also reduces the cost of re-executions. This effectively decreases the negative effects of test flakiness in complex testing environments. However, when splitting long-running tests, practitioners need to consider the potential test setup overhead of test splits.

Do Test and Environmental Complexity Increase Flakiness? An Empirical Study of SAP HANA

TL;DR

Split long-running tests into smaller tests with a narrower scope can effectively decrease the negative effects of test flakiness in complex testing environments and enables parallelization of test executions and reduces the cost of re-executions after flaky failures.

Abstract

Background: Test flakiness is a major problem in the software industry. Flaky tests fail seemingly at random without changes to the code and thus impede continuous integration (CI). Some researchers argue that all tests can be considered flaky and that tests only differ in their frequency of flaky failures. Aims: With the goal of developing mitigation strategies to reduce the negative impact of test flakiness, we study characteristics of tests and the test environment that potentially impact test flakiness. Method: We construct two datasets based on SAP HANA's test results over a 12-week period: one based on production data, the other based on targeted test executions from a dedicated flakiness experiment. We conduct correlation analysis for test and test environment characteristics with respect to their influence on the frequency of flaky test failures. Results: In our study, the average test execution time had the strongest positive correlation with the test flakiness rate (r = 0.79), which confirms previous studies. Potential reasons for higher flakiness include the larger test scope of long-running tests or test executions on a slower test infrastructure. Interestingly, the load on the testing infrastructure was not correlated with test flakiness. The relationship between test flakiness and required resources for test execution is inconclusive. Conclusions: Based on our findings, we conclude that splitting long-running tests can be an important measure for practitioners to cope with test flakiness, as it enables parallelization of test executions and also reduces the cost of re-executions. This effectively decreases the negative effects of test flakiness in complex testing environments. However, when splitting long-running tests, practitioners need to consider the potential test setup overhead of test splits.
Paper Structure (21 sections, 1 equation, 8 figures, 4 tables)

This paper contains 21 sections, 1 equation, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Study subject and data collection.
  • Figure 2: Scatterplot showing the relation between test execution time and mean flakiness rate. The orange line depicts the fitted regression line.
  • Figure 3: Scatterplot showing the relation between available memory and mean flakiness rate. Significant positive correlation in PE dataset, no significant correlation in MT dataset.
  • Figure 4: Scatterplot showing relation between available CPU threads and mean flakiness rate. Significant positive correlation in PE dataset, no significant correlation in MT dataset.
  • Figure 5: Arithmetic mean of test flakiness rate in both datasets, tests divided by the "is distributed"-label. Note that the mean for distributed tests on the MT dataset was heavily influenced by a single test with a flakiness rate of 17.1%.
  • ...and 3 more figures