Doc2OracLL: Investigating the Impact of Documentation on LLM-based Test Oracle Generation

Soneya Binta Hossain; Raygan Taylor; Matthew Dwyer

Doc2OracLL: Investigating the Impact of Documentation on LLM-based Test Oracle Generation

Soneya Binta Hossain, Raygan Taylor, Matthew Dwyer

TL;DR

This paper investigates how Java Javadoc comments influence test oracle generation (TOG) using fine-tuned decoder-only LLMs. Through the Doc2OracLL framework, it analyzes three prompt pairs, 10 models, and two data variants based on SF110*, demonstrating that well-structured Javadoc, especially the description and $@return$ components, significantly boosts TOG accuracy and bug detection. GPT-generated Javadoc further improves performance, and in Defects4J experiments, Javadoc alone can match or exceed MUT-based guidance, even outperforming mature baselines. The findings offer practical guidelines for writing concise, behavior-focused Javadoc to maximize automated TOG effectiveness and suggest that documentation can mitigate biases associated with relying on implementation code. Overall, the work advances understanding of how documentation can drive automated software testing and fault detection, with implications for both tooling and engineering practice.

Abstract

Code documentation is a critical aspect of software development, serving as a bridge between human understanding and machine-readable code. Beyond assisting developers in understanding and maintaining code, documentation also plays a critical role in automating various software engineering tasks, such as test oracle generation (TOG). In Java, Javadoc comments provide structured, natural language documentation embedded directly in the source code, typically detailing functionality, usage, parameters, return values, and exceptions. While prior research has utilized Javadoc comments in test oracle generation (TOG), there has not been a thorough investigation into their impact when combined with other contextual information, nor into identifying the most relevant components for generating correct and strong test oracles, or understanding their role in detecting real bugs. In this study, we dive deep into investigating the impact of Javadoc comments on TOG.

Doc2OracLL: Investigating the Impact of Documentation on LLM-based Test Oracle Generation

TL;DR

Abstract

Doc2OracLL: Investigating the Impact of Documentation on LLM-based Test Oracle Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)