Towards Autonomous Testing Agents via Conversational Large Language Models
Robert Feldt, Sungmin Kang, Juyeon Yoon, Shin Yoo
TL;DR
The paper addresses the burden of software testing by proposing SocraTest, a middleware-enabled framework for conversational, potentially autonomous LLM-based testing agents. It introduces a taxonomy of interaction modes from simple dialogue to autonomous planning and tool execution, and demonstrates the approach with a GPT-4 dialogue that surfaces testing strategies and executable code. It argues that greater autonomy and tool integration can substantially reduce developer effort while acknowledging current limitations in planning capabilities and tool access. The work outlines a roadmap for building practical, tool-powered testing agents and highlights key challenges for real-world deployment.
Abstract
Software testing is an important part of the development cycle, yet it requires specialized expertise and substantial developer effort to adequately test software. Recent discoveries of the capabilities of large language models (LLMs) suggest that they can be used as automated testing assistants, and thus provide helpful information and even drive the testing process. To highlight the potential of this technology, we present a taxonomy of LLM-based testing agents based on their level of autonomy, and describe how a greater level of autonomy can benefit developers in practice. An example use of LLMs as a testing assistant is provided to demonstrate how a conversational framework for testing can help developers. This also highlights how the often criticized hallucination of LLMs can be beneficial for testing. We identify other tangible benefits that LLM-driven testing agents can bestow, and also discuss potential limitations.
