Table of Contents
Fetching ...

ShellFuzzer: Grammar-based Fuzzing of Shell Interpreters

Riccardo Felici, Laura Pozzi, Carlo A. Furia

TL;DR

ShellFuzzer tackles the underexplored problem of testing Unix shell interpreters by synthesizing diverse shell scripts through grammar-based generation and targeted mutations. It combines three complementary generators with generator-specific oracles and memory-safety sanitizers to detect logic and memory bugs while avoiding destructive test runs. Empirical results on mksh show 8 unique issues (7 confirmed), with the generators proving largely complementary and offering robust, repeatable bug discovery. A comparative analysis with AFL++ reveals that ShellFuzzer excels at identifying logic errors and provides more stable testing without destructive crashes, highlighting its value as a focused tool for shell correctness and reliability.

Abstract

Despite its long-standing popularity and fundamental role in an operating system, the Unix shell has rarely been a subject of academic research. In particular, regardless of the significant progress in compiler testing, there has been hardly any work applying automated testing techniques to detect faults and vulnerabilities in shell interpreters. To address this important shortcoming, we present ShellFuzzer: a technique to test Unix shell interpreters by automatically generating a large number of shell scripts. ShellFuzzer combines grammar-based generation with selected random mutations, so as to produce a diverse range of shell programs with predictable characteristics (e.g., valid according to the language standard, and free from destructive behavior). In our experimental evaluation, ShellFuzzer generated shell programs that exposed 8 previously unknown issues that affected a recent version of the mksh POSIX-compliant shell; the shell maintainers confirmed 7 of these issues, and addressed them in the latest revisions of the shell's open-source implementation.

ShellFuzzer: Grammar-based Fuzzing of Shell Interpreters

TL;DR

ShellFuzzer tackles the underexplored problem of testing Unix shell interpreters by synthesizing diverse shell scripts through grammar-based generation and targeted mutations. It combines three complementary generators with generator-specific oracles and memory-safety sanitizers to detect logic and memory bugs while avoiding destructive test runs. Empirical results on mksh show 8 unique issues (7 confirmed), with the generators proving largely complementary and offering robust, repeatable bug discovery. A comparative analysis with AFL++ reveals that ShellFuzzer excels at identifying logic errors and provides more stable testing without destructive crashes, highlighting its value as a focused tool for shell correctness and reliability.

Abstract

Despite its long-standing popularity and fundamental role in an operating system, the Unix shell has rarely been a subject of academic research. In particular, regardless of the significant progress in compiler testing, there has been hardly any work applying automated testing techniques to detect faults and vulnerabilities in shell interpreters. To address this important shortcoming, we present ShellFuzzer: a technique to test Unix shell interpreters by automatically generating a large number of shell scripts. ShellFuzzer combines grammar-based generation with selected random mutations, so as to produce a diverse range of shell programs with predictable characteristics (e.g., valid according to the language standard, and free from destructive behavior). In our experimental evaluation, ShellFuzzer generated shell programs that exposed 8 previously unknown issues that affected a recent version of the mksh POSIX-compliant shell; the shell maintainers confirmed 7 of these issues, and addressed them in the latest revisions of the shell's open-source implementation.
Paper Structure (34 sections, 9 figures, 9 tables)

This paper contains 34 sections, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Overview of how each generator of [0.5]ShellFuzzer works. A generator produces test cases (shell scripts) with certain characteristics, and is associated with an automatic oracle that captures the fundamental expected behavior of a shell interpreter when it executes those test cases.
  • Figure 2: Three shell scripts generated by [0.5]ShellFuzzer.
  • Figure 3: Delta-debugging algorithm used to reduce scripts generated by [0.5]ShellFuzzer.
  • Figure 4: Three final steps of the \ref{['fig:reduct_algorithm']}'s script reduction procedure, demonstrated on a program generated by $V$. The chunk of code that is removed in each reduction step is highlighted.
  • Figure 5: Out of all 8 unique true positive alerts observed in [0.5]ShellFuzzer's experiments, which were raised by scripts generated by each of the generators $V$, $I$, and $M_I$.
  • ...and 4 more figures