Table of Contents
Fetching ...

Is Stateful Fuzzing Really Challenging?

Cristian Daniele

TL;DR

The paper analyzes the challenges of stateful fuzzing relative to stateless fuzzing, highlighting how the need to mutate both messages and their order complicates achieving comprehensive coverage. It surveys approaches that bypass statefulness (prefix-based steering, grammar descriptions, artificial loops, targeted fuzzing, and multi-message inputs) and those that explicitly handle state through trace-based fuzzing, state inference, and snapshotting. Benchmarking stateful fuzzers is identified as a major hurdle, with ProFuzzBench providing coverage metrics but not robustly capturing state coverage, leading to potential biases. The authors argue that the field remains immature and highly system-specific, calling for scalable, generic stateful fuzzers and improved benchmarking to advance practical utility.

Abstract

Fuzzing has been proven extremely effective in finding vulnerabilities in software. When it comes to fuzz stateless systems, analysts have no doubts about the choice to make. In fact, among the plethora of stateless fuzzers devised in the last 20 years, AFL (with its descendants AFL++ and LibAFL) stood up for its effectiveness, speed and ability to find bugs. On the other hand, when dealing with stateful systems, it is not clear what is the best tool to use. In fact, the research community struggles to devise (and benchmark) effective and generic stateful fuzzers. In this short paper, we discuss the reasons that make stateful fuzzers difficult to devise and benchmark.

Is Stateful Fuzzing Really Challenging?

TL;DR

The paper analyzes the challenges of stateful fuzzing relative to stateless fuzzing, highlighting how the need to mutate both messages and their order complicates achieving comprehensive coverage. It surveys approaches that bypass statefulness (prefix-based steering, grammar descriptions, artificial loops, targeted fuzzing, and multi-message inputs) and those that explicitly handle state through trace-based fuzzing, state inference, and snapshotting. Benchmarking stateful fuzzers is identified as a major hurdle, with ProFuzzBench providing coverage metrics but not robustly capturing state coverage, leading to potential biases. The authors argue that the field remains immature and highly system-specific, calling for scalable, generic stateful fuzzers and improved benchmarking to advance practical utility.

Abstract

Fuzzing has been proven extremely effective in finding vulnerabilities in software. When it comes to fuzz stateless systems, analysts have no doubts about the choice to make. In fact, among the plethora of stateless fuzzers devised in the last 20 years, AFL (with its descendants AFL++ and LibAFL) stood up for its effectiveness, speed and ability to find bugs. On the other hand, when dealing with stateful systems, it is not clear what is the best tool to use. In fact, the research community struggles to devise (and benchmark) effective and generic stateful fuzzers. In this short paper, we discuss the reasons that make stateful fuzzers difficult to devise and benchmark.
Paper Structure (12 sections)