Table of Contents
Fetching ...

Doc2OracLL: Investigating the Impact of Documentation on LLM-based Test Oracle Generation

Soneya Binta Hossain, Raygan Taylor, Matthew Dwyer

TL;DR

This paper investigates how Java Javadoc comments influence test oracle generation (TOG) using fine-tuned decoder-only LLMs. Through the Doc2OracLL framework, it analyzes three prompt pairs, 10 models, and two data variants based on SF110*, demonstrating that well-structured Javadoc, especially the description and $@return$ components, significantly boosts TOG accuracy and bug detection. GPT-generated Javadoc further improves performance, and in Defects4J experiments, Javadoc alone can match or exceed MUT-based guidance, even outperforming mature baselines. The findings offer practical guidelines for writing concise, behavior-focused Javadoc to maximize automated TOG effectiveness and suggest that documentation can mitigate biases associated with relying on implementation code. Overall, the work advances understanding of how documentation can drive automated software testing and fault detection, with implications for both tooling and engineering practice.

Abstract

Code documentation is a critical aspect of software development, serving as a bridge between human understanding and machine-readable code. Beyond assisting developers in understanding and maintaining code, documentation also plays a critical role in automating various software engineering tasks, such as test oracle generation (TOG). In Java, Javadoc comments provide structured, natural language documentation embedded directly in the source code, typically detailing functionality, usage, parameters, return values, and exceptions. While prior research has utilized Javadoc comments in test oracle generation (TOG), there has not been a thorough investigation into their impact when combined with other contextual information, nor into identifying the most relevant components for generating correct and strong test oracles, or understanding their role in detecting real bugs. In this study, we dive deep into investigating the impact of Javadoc comments on TOG.

Doc2OracLL: Investigating the Impact of Documentation on LLM-based Test Oracle Generation

TL;DR

This paper investigates how Java Javadoc comments influence test oracle generation (TOG) using fine-tuned decoder-only LLMs. Through the Doc2OracLL framework, it analyzes three prompt pairs, 10 models, and two data variants based on SF110*, demonstrating that well-structured Javadoc, especially the description and components, significantly boosts TOG accuracy and bug detection. GPT-generated Javadoc further improves performance, and in Defects4J experiments, Javadoc alone can match or exceed MUT-based guidance, even outperforming mature baselines. The findings offer practical guidelines for writing concise, behavior-focused Javadoc to maximize automated TOG effectiveness and suggest that documentation can mitigate biases associated with relying on implementation code. Overall, the work advances understanding of how documentation can drive automated software testing and fault detection, with implications for both tooling and engineering practice.

Abstract

Code documentation is a critical aspect of software development, serving as a bridge between human understanding and machine-readable code. Beyond assisting developers in understanding and maintaining code, documentation also plays a critical role in automating various software engineering tasks, such as test oracle generation (TOG). In Java, Javadoc comments provide structured, natural language documentation embedded directly in the source code, typically detailing functionality, usage, parameters, return values, and exceptions. While prior research has utilized Javadoc comments in test oracle generation (TOG), there has not been a thorough investigation into their impact when combined with other contextual information, nor into identifying the most relevant components for generating correct and strong test oracles, or understanding their role in detecting real bugs. In this study, we dive deep into investigating the impact of Javadoc comments on TOG.

Paper Structure

This paper contains 29 sections, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Incorrect oracle generated from buggy MUT and correct oracle generated from Javadoc comments.
  • Figure 2: Overview of our approach Doc2OracLL.
  • Figure 3: Examples showing that MUT Sig is not enough to generate correct oracles.
  • Figure 4: Impact of Javadoc comment's description on test oracle generation
  • Figure 5: Examples where removing @return tag affects (right) and does not affect (left) the generated oracle.
  • ...and 3 more figures