A Tool for Generating Exceptional Behavior Tests With Large Language Models
Linghan Zhong, Samuel Yuan, Jiyang Zhang, Yu Liu, Pengyu Nie, Junyi Jessy Li, Milos Gligoric
TL;DR
The paper addresses the challenge of limited exceptional behavior testing by introducing exLong, a CodeLlama-based LLM fine-tuned with context about stack traces, guard expressions, and non-EBTs to generate targeted EBTs. It supports both developer-oriented generation for a specific method or throw statement and machine-oriented generation across an entire codebase, including a quantized inference option for resource-constrained environments. Empirical results show exLong achieves superior runnable EBTs and coverage of target throws compared with CAT-LM, GPT-3.5, Randoop, and EvoSuite on Java projects. The work demonstrates a practical, scalable approach to improving exception handling through automated, context-aware EBT generation with a usable CLI and deployment options.
Abstract
Exceptional behavior tests (EBTs) are crucial in software development for verifying that code correctly handles unwanted events and throws appropriate exceptions. However, prior research has shown that developers often prioritize testing "happy paths", e.g., paths without unwanted events over exceptional scenarios. We present exLong, a framework that automatically generates EBTs to address this gap. exLong leverages a large language model (LLM) fine-tuned from CodeLlama and incorporates reasoning about exception-throwing traces, conditional expressions that guard throw statements, and non-exceptional behavior tests that execute similar traces. Our demonstration video illustrates how exLong can effectively assist developers in creating comprehensive EBTs for their project (available at https://youtu.be/Jro8kMgplZk).
