SBFT Tool Competition 2024 -- Python Test Case Generation Track
Nicolas Erni, Al-Ameen Mohammed Ali Mohammed, Christian Birchler, Pouria Derakhshanfar, Stephan Lukasczyk, Sebastiano Panichella
TL;DR
The paper tackles the challenge of automatic test-case generation for Python, a language with dynamic typing that complicates coverage-based evaluation. It describes the first Python SBFT Tool Competition, where four tools (UTBotPython, Pynguin, Hypothesis Ghostwriter, Klara) were run on 35 Python files from seven open-source projects within a 400-second budget, using line, branch, and mutation coverage as metrics. The results position Pynguin as the top performer overall, with UTBotPython, Hypothesis Ghostwriter, and Klara following, while several benchmarks exhibit zero coverage or partial coverage due to code complexity and modularity issues, highlighting limitations in current TCG approaches for Python. The study provides a benchmarked, reproducible framework and offers insights into the practical hurdles of deploying test generators in real-world Python code, informing future enhancements and more robust evaluation criteria for subsequent editions.
Abstract
Test case generation (TCG) for Python poses distinctive challenges due to the language's dynamic nature and the absence of strict type information. Previous research has successfully explored automated unit TCG for Python, with solutions outperforming random test generation methods. Nevertheless, fundamental issues persist, hindering the practical adoption of existing test case generators. To address these challenges, we report on the organization, challenges, and results of the first edition of the Python Testing Competition. Four tools, namely UTBotPython, Klara, Hypothesis Ghostwriter, and Pynguin were executed on a benchmark set consisting of 35 Python source files sampled from 7 open-source Python projects for a time budget of 400 seconds. We considered one configuration of each tool for each test subject and evaluated the tools' effectiveness in terms of code and mutation coverage. This paper describes our methodology, the analysis of the results together with the competing tools, and the challenges faced while running the competition experiments.
