Exceptional Behaviors: How Frequently Are They Tested?

Andre Hora; Gordon Fraser

Exceptional Behaviors: How Frequently Are They Tested?

Andre Hora, Gordon Fraser

TL;DR

The paper addresses the gap that prior work focuses on exceptions that propagate to tests, by analyzing all runtime exceptions raised during test execution in 25 Python systems. It instrumented test suites with SpotFlow to capture method-level calls and exceptions, analyzing 5,372 methods, 17.9 million calls, and 1.4 million exceptions. The key findings show that 21.4% of methods raise exceptions at runtime, and among those methods, a median of 1 in 10 calls triggers an exception, with about 80% of exception-raising methods being infrequent. The work highlights practical implications for developing tools to exercise exceptional paths and for refactoring expensive try/except blocks, illustrating that exception-raising behavior is not inherently abnormal and should be considered in testing and maintenance.

Abstract

Exceptions allow developers to handle error cases expected to occur infrequently. Ideally, good test suites should test both normal and exceptional behaviors to catch more bugs and avoid regressions. While current research analyzes exceptions that propagate to tests, it does not explore other exceptions that do not reach the tests. In this paper, we provide an empirical study to explore how frequently exceptional behaviors are tested in real-world systems. We consider both exceptions that propagate to tests and the ones that do not reach the tests. For this purpose, we run an instrumented version of test suites, monitor their execution, and collect information about the exceptions raised at runtime. We analyze the test suites of 25 Python systems, covering 5,372 executed methods, 17.9M calls, and 1.4M raised exceptions. We find that 21.4% of the executed methods do raise exceptions at runtime. In methods that raise exceptions, on the median, 1 in 10 calls exercise exceptional behaviors. Close to 80% of the methods that raise exceptions do so infrequently, but about 20% raise exceptions more frequently. Finally, we provide implications for researchers and practitioners. We suggest developing novel tools to support exercising exceptional behaviors and refactoring expensive try/except blocks. We also call attention to the fact that exception-raising behaviors are not necessarily "abnormal" or rare.

Exceptional Behaviors: How Frequently Are They Tested?

TL;DR

Abstract

Paper Structure (17 sections, 10 figures, 6 tables)

This paper contains 17 sections, 10 figures, 6 tables.

Introduction
Study Design
Case Studies
Monitoring Methods Executed by Tests
Collecting Data from Executed Methods and Calls
Research Questions
Results
RQ1: How many methods raise exceptions at runtime?
RQ2: How frequently do calls on exception-raising methods actually lead to exceptions?
RQ3: How do exception-raising methods and calls vary by system?
Discussion and Implications
Most Exceptional Behaviors Are Rarely Exercised
Some Exceptional Behaviors Are Frequently Exercised
Refactoring Expensive try/except Blocks
Threats to Validity
...and 2 more sections

Figures (10)

Figure 1: Examples of methods with exceptional behaviors.
Figure 2: Distribution of method calls at runtime.
Figure 3: Distribution of executed paths at runtime.
Figure 4: Distribution of the method calls raising exceptions at runtime (absolute and relative values).
Figure 5: Method get_ttext of the email library in CPython. Calls: 9,188; exception-raising calls: 1 ($<$1%).
...and 5 more figures

Exceptional Behaviors: How Frequently Are They Tested?

TL;DR

Abstract

Exceptional Behaviors: How Frequently Are They Tested?

Authors

TL;DR

Abstract

Table of Contents

Figures (10)