Towards Autonomous Testing Agents via Conversational Large Language Models

Robert Feldt; Sungmin Kang; Juyeon Yoon; Shin Yoo

Towards Autonomous Testing Agents via Conversational Large Language Models

Robert Feldt, Sungmin Kang, Juyeon Yoon, Shin Yoo

TL;DR

The paper addresses the burden of software testing by proposing SocraTest, a middleware-enabled framework for conversational, potentially autonomous LLM-based testing agents. It introduces a taxonomy of interaction modes from simple dialogue to autonomous planning and tool execution, and demonstrates the approach with a GPT-4 dialogue that surfaces testing strategies and executable code. It argues that greater autonomy and tool integration can substantially reduce developer effort while acknowledging current limitations in planning capabilities and tool access. The work outlines a roadmap for building practical, tool-powered testing agents and highlights key challenges for real-world deployment.

Abstract

Software testing is an important part of the development cycle, yet it requires specialized expertise and substantial developer effort to adequately test software. Recent discoveries of the capabilities of large language models (LLMs) suggest that they can be used as automated testing assistants, and thus provide helpful information and even drive the testing process. To highlight the potential of this technology, we present a taxonomy of LLM-based testing agents based on their level of autonomy, and describe how a greater level of autonomy can benefit developers in practice. An example use of LLMs as a testing assistant is provided to demonstrate how a conversational framework for testing can help developers. This also highlights how the often criticized hallucination of LLMs can be beneficial for testing. We identify other tangible benefits that LLM-driven testing agents can bestow, and also discuss potential limitations.

Towards Autonomous Testing Agents via Conversational Large Language Models

TL;DR

Abstract

Towards Autonomous Testing Agents via Conversational Large Language Models

Authors

TL;DR

Abstract

Table of Contents