ShellFuzzer: Grammar-based Fuzzing of Shell Interpreters
Riccardo Felici, Laura Pozzi, Carlo A. Furia
TL;DR
ShellFuzzer tackles the underexplored problem of testing Unix shell interpreters by synthesizing diverse shell scripts through grammar-based generation and targeted mutations. It combines three complementary generators with generator-specific oracles and memory-safety sanitizers to detect logic and memory bugs while avoiding destructive test runs. Empirical results on mksh show 8 unique issues (7 confirmed), with the generators proving largely complementary and offering robust, repeatable bug discovery. A comparative analysis with AFL++ reveals that ShellFuzzer excels at identifying logic errors and provides more stable testing without destructive crashes, highlighting its value as a focused tool for shell correctness and reliability.
Abstract
Despite its long-standing popularity and fundamental role in an operating system, the Unix shell has rarely been a subject of academic research. In particular, regardless of the significant progress in compiler testing, there has been hardly any work applying automated testing techniques to detect faults and vulnerabilities in shell interpreters. To address this important shortcoming, we present ShellFuzzer: a technique to test Unix shell interpreters by automatically generating a large number of shell scripts. ShellFuzzer combines grammar-based generation with selected random mutations, so as to produce a diverse range of shell programs with predictable characteristics (e.g., valid according to the language standard, and free from destructive behavior). In our experimental evaluation, ShellFuzzer generated shell programs that exposed 8 previously unknown issues that affected a recent version of the mksh POSIX-compliant shell; the shell maintainers confirmed 7 of these issues, and addressed them in the latest revisions of the shell's open-source implementation.
