Bugfix: a standard language, database schema and repository for research on bugs and automatic program repair
Victoria Kananchuk, Ilgiz Mustafin, Bertrand Meyer
TL;DR
The paper tackles the fragmentation in automatic program repair (APR) research by introducing Bugfix, a standardized framework with a human-readable Bugfix language, an API, and a public bug/repair repository. It delineates six element kinds (bugs, fixes, applications, examples, constructs, languages) that separate language-independent patterns from language-specific constructs and provide machine-accessible representations via JSON. By enabling precise specification of bug patterns, fixes, and real-world applications, along with a scalable language description mechanism tied to Tree-sitter grammars, Bugfix supports reproducible evaluation and cross-project comparisons. The work presents a public website and data portal for access and extension, framing Bugfix as a community resource to accelerate progress in APR and related bug-analysis research.
Abstract
Automatic Program Repair (APR) is a brilliant idea: when detecting a bug, also provide suggestions for correcting the program. Progress towards that goal is hindered by the absence of a common frame of reference for the multiplicity of APR ideas, methods, tools, programming languages and environments. Bugfix is an effort at providing such a framework: a standardized set of notations, tools and interfaces, as well as a database of bugs and fixes, for use by the APR research community to try out ideas and compare results. The most directly visible component of the Bugfix effort is the Bugfix language, a human-readable formalism making it possible to describe elements of the following kinds: a bug (described abstractly, for example the permutation of two arguments in a call); a bug example (an actual occurrence of a bug, in a specific code written in a specific programming language, and usually recorded in some repository); a fix (a particular correction of a bug, obtained for example by reversing the misplaced arguments); an application (an entity that demonstrates how a actual code example matches with a fix); a construct (the abstract description of a programming mechanism, for example a ``while'' loop, independently of its realization in a programming language; and a language (a description of how a particular programming language includes certain constructs and provides specific concrete syntax for each of them -- for example Java includes loop, assignment etc. and has a defined format for each of them). A JSON API provides it in a form accessible to tools. Bugfix includes a repository containing a considerable amount of bugs, examples and fixes. Note: An early step towards this article was a short contribution (Ref [1]) to the 2024 ICSE. The present text reuses a few elements of introduction and motivation but is otherwise thoroughly reworked and extended.
