Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model
Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, Qing Wang
TL;DR
InputBlaster presents a novel, LLM-driven pipeline for automatically generating unusual text inputs to uncover crash bugs in mobile apps. By producing test generators and mutation rules, guided by in-context learning and rich prompts, it achieves high bug-detection rates (78% across 36 buggy inputs in 31 apps) and uncovers 37 unseen crashes when integrated with GUI testing. The work provides empirical insights into text-input constraints, demonstrates strong ablation results, and shows practical usefulness by discovering and enabling fixes for real-world bugs across diverse platforms. Overall, InputBlaster advances automated testing for input widgets and highlights the potential of LLM-assisted mutation strategies in software testing.
Abstract
Mobile applications have become a ubiquitous part of our daily life, providing users with access to various services and utilities. Text input, as an important interaction channel between users and applications, plays an important role in core functionality such as search queries, authentication, messaging, etc. However, certain special text (e.g., -18 for Font Size) can cause the app to crash, and generating diversified unusual inputs for fully testing the app is highly demanded. Nevertheless, this is also challenging due to the combination of explosion dilemma, high context sensitivity, and complex constraint relations. This paper proposes InputBlaster which leverages the LLM to automatically generate unusual text inputs for mobile app crash detection. It formulates the unusual inputs generation problem as a task of producing a set of test generators, each of which can yield a batch of unusual text inputs under the same mutation rule. In detail, InputBlaster leverages LLM to produce the test generators together with the mutation rules serving as the reasoning chain, and utilizes the in-context learning schema to demonstrate the LLM with examples for boosting the performance. InputBlaster is evaluated on 36 text input widgets with cash bugs involving 31 popular Android apps, and results show that it achieves 78% bug detection rate, with 136% higher than the best baseline. Besides, we integrate it with the automated GUI testing tool and detect 37 unseen crashes in real-world apps from Google Play.
