Table of Contents
Fetching ...

Test Case Generation for Dialogflow Task-Based Chatbots

Rocco Gianni Rapisarda, Davide Ginelli, Diego Clerissi, Leonardo Mariani

TL;DR

The paper tackles functional testing of task-based chatbots by introducing CTG, a dynamic test generation framework for Dialogflow that starts from Botium seed tests and incrementally expands them using actual bot responses. CTG comprises four components—Generator, Expander, Executor, and Cleaner—enabling runtime execution, path exploration through utterance and entity value expansion, and environment cleanup for reliable regression testing. Empirical evaluation across seven Dialogflow chatbots shows CTG achieving higher correctness and mutation-detection rates than Botium and Charm, with broader coverage of intents and entities. This work advances automated, executable-test generation for conversational AI and suggests future work in cross-platform deployment and more flexible oracles for negative scenarios in LLM-enabled systems.

Abstract

Chatbots are software typically embedded in Web and Mobile applications designed to assist the user in a plethora of activities, from chit-chatting to task completion. They enable diverse forms of interactions, like text and voice commands. As any software, even chatbots are susceptible to bugs, and their pervasiveness in our lives, as well as the underlying technological advancements, call for tailored quality assurance techniques. However, test case generation techniques for conversational chatbots are still limited. In this paper, we present Chatbot Test Generator (CTG), an automated testing technique designed for task-based chatbots. We conducted an experiment comparing CTG with state-of-the-art BOTIUM and CHARM tools with seven chatbots, observing that the test cases generated by CTG outperformed the competitors, in terms of robustness and effectiveness.

Test Case Generation for Dialogflow Task-Based Chatbots

TL;DR

The paper tackles functional testing of task-based chatbots by introducing CTG, a dynamic test generation framework for Dialogflow that starts from Botium seed tests and incrementally expands them using actual bot responses. CTG comprises four components—Generator, Expander, Executor, and Cleaner—enabling runtime execution, path exploration through utterance and entity value expansion, and environment cleanup for reliable regression testing. Empirical evaluation across seven Dialogflow chatbots shows CTG achieving higher correctness and mutation-detection rates than Botium and Charm, with broader coverage of intents and entities. This work advances automated, executable-test generation for conversational AI and suggests future work in cross-platform deployment and more flexible oracles for negative scenarios in LLM-enabled systems.

Abstract

Chatbots are software typically embedded in Web and Mobile applications designed to assist the user in a plethora of activities, from chit-chatting to task completion. They enable diverse forms of interactions, like text and voice commands. As any software, even chatbots are susceptible to bugs, and their pervasiveness in our lives, as well as the underlying technological advancements, call for tailored quality assurance techniques. However, test case generation techniques for conversational chatbots are still limited. In this paper, we present Chatbot Test Generator (CTG), an automated testing technique designed for task-based chatbots. We conducted an experiment comparing CTG with state-of-the-art BOTIUM and CHARM tools with seven chatbots, observing that the test cases generated by CTG outperformed the competitors, in terms of robustness and effectiveness.

Paper Structure

This paper contains 16 sections, 5 figures, 5 tables, 3 algorithms.

Figures (5)

  • Figure 1: CTG architecture.
  • Figure 2: Overview of a Botium test case and Dialogflow chatbot data.
  • Figure 3: A flaky test case generated by all the techniques.
  • Figure 4: Test cases generated by CTG capturing dynamic responses.
  • Figure 5: A same test case comparison generated by Botium and CTG.