Table of Contents
Fetching ...

Taming Timeout Flakiness: An Empirical Study of SAP HANA

Alexander Berndt, Sebastian Baltes, Thomas Bach

TL;DR

The paper investigates timeout-induced flakiness in SAP HANA system tests and presents an empirical study using two datasets (MT and ATV) with nearly 1 million test executions to quantify the impact of timeouts on regression signals. It proposes a statistical, Cantelli-inspired approach to compute cost-optimal timeout values that balance flaky restarts against average test execution time, avoiding extensive re-execution across many timeout configurations. The results show that timeouts are a major driver of flaky failures (about 70%), that increasing timeouts can dramatically reduce flakiness (roughly 65-80%), and that the optimization can reduce flaky timeouts by up to 80% while lowering median timeouts from 15 to 11 minutes and reducing average costs by up to an order of magnitude. These findings offer practical guidance for automating timeout tuning in large-scale industrial software, with a baseline two-hour timeout proposed to simplify maintenance and adaptivity, while acknowledging hardware and environmental factors that influence timeout behavior. $T$ denotes test execution time, $t_{max}$ the configured timeout, and $C(t_{max})$ the cost function optimized to minimize overall expense under timeout-flaky restarts, formalized as shown above.$

Abstract

Regression testing aims to prevent code changes from breaking existing features. Flaky tests negatively affect regression testing because they result in test failures that are not necessarily caused by code changes, thus providing an ambiguous signal. Test timeouts are one contributing factor to such flaky test failures. With the goal of reducing test flakiness in SAP HANA, we empirically study the impact of test timeouts on flakiness in system tests. We evaluate different approaches to automatically adjust timeout values, assessing their suitability for reducing execution time costs and improving build turnaround times. We collect metadata on SAP HANA's test executions by repeatedly executing tests on the same code revision over a period of six months. We analyze the test flakiness rate, investigate the evolution of test timeout values, and evaluate different approaches for optimizing timeout values. The test flakiness rate ranges from 49% to 70%, depending on the number of repeated test executions. Test timeouts account for 70% of flaky test failures. Developers typically react to flaky timeouts by manually increasing timeout values or splitting long-running tests. However, manually adjusting timeout values is a tedious task. Our approach for timeout optimization reduces timeout-related flaky failures by 80% and reduces the overall median timeout value by 25%, i.e., blocked tests are identified faster. Test timeouts are a major contributing factor to flakiness in system tests. It is challenging for developers to effectively mitigate this problem manually. Our technique for optimizing timeout values reduces flaky failures while minimizing test costs. Practitioners working on large-scale industrial software systems can use our findings to increase the effectiveness of their system tests while reducing the burden on developers to manually maintain appropriate timeout values.

Taming Timeout Flakiness: An Empirical Study of SAP HANA

TL;DR

The paper investigates timeout-induced flakiness in SAP HANA system tests and presents an empirical study using two datasets (MT and ATV) with nearly 1 million test executions to quantify the impact of timeouts on regression signals. It proposes a statistical, Cantelli-inspired approach to compute cost-optimal timeout values that balance flaky restarts against average test execution time, avoiding extensive re-execution across many timeout configurations. The results show that timeouts are a major driver of flaky failures (about 70%), that increasing timeouts can dramatically reduce flakiness (roughly 65-80%), and that the optimization can reduce flaky timeouts by up to 80% while lowering median timeouts from 15 to 11 minutes and reducing average costs by up to an order of magnitude. These findings offer practical guidance for automating timeout tuning in large-scale industrial software, with a baseline two-hour timeout proposed to simplify maintenance and adaptivity, while acknowledging hardware and environmental factors that influence timeout behavior. denotes test execution time, the configured timeout, and the cost function optimized to minimize overall expense under timeout-flaky restarts, formalized as shown above.$

Abstract

Regression testing aims to prevent code changes from breaking existing features. Flaky tests negatively affect regression testing because they result in test failures that are not necessarily caused by code changes, thus providing an ambiguous signal. Test timeouts are one contributing factor to such flaky test failures. With the goal of reducing test flakiness in SAP HANA, we empirically study the impact of test timeouts on flakiness in system tests. We evaluate different approaches to automatically adjust timeout values, assessing their suitability for reducing execution time costs and improving build turnaround times. We collect metadata on SAP HANA's test executions by repeatedly executing tests on the same code revision over a period of six months. We analyze the test flakiness rate, investigate the evolution of test timeout values, and evaluate different approaches for optimizing timeout values. The test flakiness rate ranges from 49% to 70%, depending on the number of repeated test executions. Test timeouts account for 70% of flaky test failures. Developers typically react to flaky timeouts by manually increasing timeout values or splitting long-running tests. However, manually adjusting timeout values is a tedious task. Our approach for timeout optimization reduces timeout-related flaky failures by 80% and reduces the overall median timeout value by 25%, i.e., blocked tests are identified faster. Test timeouts are a major contributing factor to flakiness in system tests. It is challenging for developers to effectively mitigate this problem manually. Our technique for optimizing timeout values reduces flaky failures while minimizing test costs. Practitioners working on large-scale industrial software systems can use our findings to increase the effectiveness of their system tests while reducing the burden on developers to manually maintain appropriate timeout values.
Paper Structure (15 sections, 4 equations, 9 figures, 3 tables)

This paper contains 15 sections, 4 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Test "pyramid" of SAP HANA bach2022testing.
  • Figure 2: Testing stages of SAP HANA. In this paper, we focus on pre-submit testing.
  • Figure 3: Overview of our data collection process.
  • Figure 4: Histogram showing the distribution of execution times in a sample that consists of $n=535$ test executions. The vertical lines depict the timeout value before and after optimization.
  • Figure 5: Comparison of the average test cost of different static timeout values.
  • ...and 4 more figures