Table of Contents
Fetching ...

Tratto: A Neuro-Symbolic Approach to Deriving Axiomatic Test Oracles

Davide Molinelli, Alberto Martin-Lopez, Elliott Zackrone, Beyza Eken, Michael D. Ernst, Mauro Pezzè

TL;DR

Tratto presents a neuro-symbolic framework that derives axiomatic test oracles from code and documentation by token-by-token generation guided by a grammar-based symbolic module and a transformer-based neural module. The architecture enables compilable, semantically appropriate oracles with a multitask model that jointly evaluates whether to produce an oracle and selects the next token. Empirical results show Tratto outperforms state-of-the-art symbolic (Jdoctor) and neural (GPT-4) baselines in accuracy, precision, and F1, while maintaining robustness to documentation variations and increasing mutation scores when integrated into EvoSuite workflows. The work contributes substantial datasets and demonstrates that integrating symbolic constraints with neural guidance can markedly improve axiomatic oracle generation for software testing. The approach holds potential for broader adoption in program understanding, requirements validation, and runtime verification.

Abstract

This paper presents Tratto, a neuro-symbolic approach that generates assertions (boolean expressions) that can serve as axiomatic oracles, from source code and documentation. The symbolic module of Tratto takes advantage of the grammar of the programming language, the unit under test, and the context of the unit (its class and available APIs) to restrict the search space of the tokens that can be successfully used to generate valid oracles. The neural module of Tratto uses transformers fine-tuned for both deciding whether to output an oracle or not and selecting the next lexical token to incrementally build the oracle from the set of tokens returned by the symbolic module. Our experiments show that Tratto outperforms the state-of-the-art axiomatic oracle generation approaches, with 73% accuracy, 72% precision, and 61% F1-score, largely higher than the best results of the symbolic and neural approaches considered in our study (61%, 62%, and 37%, respectively). Tratto can generate three times more axiomatic oracles than current symbolic approaches, while generating 10 times less false positives than GPT4 complemented with few-shot learning and Chain-of-Thought prompting.

Tratto: A Neuro-Symbolic Approach to Deriving Axiomatic Test Oracles

TL;DR

Tratto presents a neuro-symbolic framework that derives axiomatic test oracles from code and documentation by token-by-token generation guided by a grammar-based symbolic module and a transformer-based neural module. The architecture enables compilable, semantically appropriate oracles with a multitask model that jointly evaluates whether to produce an oracle and selects the next token. Empirical results show Tratto outperforms state-of-the-art symbolic (Jdoctor) and neural (GPT-4) baselines in accuracy, precision, and F1, while maintaining robustness to documentation variations and increasing mutation scores when integrated into EvoSuite workflows. The work contributes substantial datasets and demonstrates that integrating symbolic constraints with neural guidance can markedly improve axiomatic oracle generation for software testing. The approach holds potential for broader adoption in program understanding, requirements validation, and runtime verification.

Abstract

This paper presents Tratto, a neuro-symbolic approach that generates assertions (boolean expressions) that can serve as axiomatic oracles, from source code and documentation. The symbolic module of Tratto takes advantage of the grammar of the programming language, the unit under test, and the context of the unit (its class and available APIs) to restrict the search space of the tokens that can be successfully used to generate valid oracles. The neural module of Tratto uses transformers fine-tuned for both deciding whether to output an oracle or not and selecting the next lexical token to incrementally build the oracle from the set of tokens returned by the symbolic module. Our experiments show that Tratto outperforms the state-of-the-art axiomatic oracle generation approaches, with 73% accuracy, 72% precision, and 61% F1-score, largely higher than the best results of the symbolic and neural approaches considered in our study (61%, 62%, and 37%, respectively). Tratto can generate three times more axiomatic oracles than current symbolic approaches, while generating 10 times less false positives than GPT4 complemented with few-shot learning and Chain-of-Thought prompting.

Paper Structure

This paper contains 30 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Workflow of Tratto.
  • Figure 2: Data collection process.
  • Figure 3: Converting one oracle sample into four token samples. Oracle ($o$) at the top; partial oracles ($po$) and legal tokens ($lt[]$) on top and bottom white boxes, respectively; next tokens ($t$) in green.