A Transformer-based Approach for Augmenting Software Engineering Chatbots Datasets
Ahmad Abdellatif, Khaled Badran, Diego Elias Costa, Emad Shihab
TL;DR
The paper tackles the scarcity and cost of building high-quality NLU training data for software engineering chatbots. It proposes a transformer-based augmentation pipeline that combines SE-specific synonym replacement with BART-driven paraphrasing, followed by diversity filtering and entity-labeling heuristics to produce diverse, semantically consistent queries. Evaluated on three SE datasets with a Rasa NLU backend, the approach yields small but reliable improvements in F1-score and increases classification confidence, while highlighting the balance between data quality and quantity. The work demonstrates practical benefits for SE chatbot practitioners and outlines directions for domain-tuned paraphrase datasets and broader evaluation across NLUs.
Abstract
Background: The adoption of chatbots into software development tasks has become increasingly popular among practitioners, driven by the advantages of cost reduction and acceleration of the software development process. Chatbots understand users' queries through the Natural Language Understanding component (NLU). To yield reasonable performance, NLUs have to be trained with extensive, high-quality datasets, that express a multitude of ways users may interact with chatbots. However, previous studies show that creating a high-quality training dataset for software engineering chatbots is expensive in terms of both resources and time. Aims: Therefore, in this paper, we present an automated transformer-based approach to augment software engineering chatbot datasets. Method: Our approach combines traditional natural language processing techniques with the BART transformer to augment a dataset by generating queries through synonym replacement and paraphrasing. We evaluate the impact of using the augmentation approach on the Rasa NLU's performance using three software engineering datasets. Results: Overall, the augmentation approach shows promising results in improving the Rasa's performance, augmenting queries with varying sentence structures while preserving their original semantics. Furthermore, it increases Rasa's confidence in its intent classification for the correctly classified intents. Conclusions: We believe that our study helps practitioners improve the performance of their chatbots and guides future research to propose augmentation techniques for SE chatbots.
