Test Case Generation for Dialogflow Task-Based Chatbots
Rocco Gianni Rapisarda, Davide Ginelli, Diego Clerissi, Leonardo Mariani
TL;DR
The paper tackles functional testing of task-based chatbots by introducing CTG, a dynamic test generation framework for Dialogflow that starts from Botium seed tests and incrementally expands them using actual bot responses. CTG comprises four components—Generator, Expander, Executor, and Cleaner—enabling runtime execution, path exploration through utterance and entity value expansion, and environment cleanup for reliable regression testing. Empirical evaluation across seven Dialogflow chatbots shows CTG achieving higher correctness and mutation-detection rates than Botium and Charm, with broader coverage of intents and entities. This work advances automated, executable-test generation for conversational AI and suggests future work in cross-platform deployment and more flexible oracles for negative scenarios in LLM-enabled systems.
Abstract
Chatbots are software typically embedded in Web and Mobile applications designed to assist the user in a plethora of activities, from chit-chatting to task completion. They enable diverse forms of interactions, like text and voice commands. As any software, even chatbots are susceptible to bugs, and their pervasiveness in our lives, as well as the underlying technological advancements, call for tailored quality assurance techniques. However, test case generation techniques for conversational chatbots are still limited. In this paper, we present Chatbot Test Generator (CTG), an automated testing technique designed for task-based chatbots. We conducted an experiment comparing CTG with state-of-the-art BOTIUM and CHARM tools with seven chatbots, observing that the test cases generated by CTG outperformed the competitors, in terms of robustness and effectiveness.
