Table of Contents
Fetching ...

Test Oracle Automation in the era of LLMs

Facundo Molina, Alessandra Gorla

TL;DR

The paper investigates the potential of using Large Language Models to automate test oracles across three major oracle types: test assertions, contracts, and metamorphic relations. It surveys prompt-based and model-tuning approaches, reviews early results in assertion generation, and discusses the challenges and lack of prior work for contracts and metamorphic relations. It highlights critical threats, including oracle deficiencies and data leakages, and proposes assurance-driven strategies (Assured LLMSE) to mitigate risks and improve oracle quality. The work provides a roadmap for safer, more effective LLM-assisted oracle automation with practical mitigation steps and calls for further empirical study. Overall, it positions LLMs as a promising but risky tool for enhancing fault detection through automated oracles, requiring robust assurance and evaluation frameworks.

Abstract

The effectiveness of a test suite in detecting faults highly depends on the correctness and completeness of its test oracles. Large Language Models (LLMs) have already demonstrated remarkable proficiency in tackling diverse software testing tasks, such as automated test generation and program repair. This paper aims to enable discussions on the potential of using LLMs for test oracle automation, along with the challenges that may emerge during the generation of various types of oracles. Additionally, our aim is to initiate discussions on the primary threats that SE researchers must consider when employing LLMs for oracle automation, encompassing concerns regarding oracle deficiencies and data leakages.

Test Oracle Automation in the era of LLMs

TL;DR

The paper investigates the potential of using Large Language Models to automate test oracles across three major oracle types: test assertions, contracts, and metamorphic relations. It surveys prompt-based and model-tuning approaches, reviews early results in assertion generation, and discusses the challenges and lack of prior work for contracts and metamorphic relations. It highlights critical threats, including oracle deficiencies and data leakages, and proposes assurance-driven strategies (Assured LLMSE) to mitigate risks and improve oracle quality. The work provides a roadmap for safer, more effective LLM-assisted oracle automation with practical mitigation steps and calls for further empirical study. Overall, it positions LLMs as a promising but risky tool for enhancing fault detection through automated oracles, requiring robust assurance and evaluation frameworks.

Abstract

The effectiveness of a test suite in detecting faults highly depends on the correctness and completeness of its test oracles. Large Language Models (LLMs) have already demonstrated remarkable proficiency in tackling diverse software testing tasks, such as automated test generation and program repair. This paper aims to enable discussions on the potential of using LLMs for test oracle automation, along with the challenges that may emerge during the generation of various types of oracles. Additionally, our aim is to initiate discussions on the primary threats that SE researchers must consider when employing LLMs for oracle automation, encompassing concerns regarding oracle deficiencies and data leakages.
Paper Structure (9 sections, 4 figures)

This paper contains 9 sections, 4 figures.

Figures (4)

  • Figure 1: A simple test for a Stack class.
  • Figure 2: Implementation of a push operation for a Stack.
  • Figure 3: Overview of LLM-based Oracle Generation.
  • Figure 4: Test from Defects4J, and the assertions produced by ChatGPT-3.5.