A Comparative Quality Metric for Untargeted Fuzzing with Logic State Coverage
Gwangmu Lee
TL;DR
This work addresses the lack of reliable metrics for comparing untargeted fuzzers by proposing logic state coverage, a proxy that counts distinct logic states—sets of satisfied branches observed in a single execution—to quantify the observed interesting behaviors. The authors formalize fuzzing, define the logic space and the desirable properties of logic states, and present a Bloom-filter-based measurement pipeline that records, updates, and counts logic state coverage with near-constant overhead. A preliminary evaluation using AFL++ and XMLLint demonstrates gradual growth in logic state coverage over a 24-hour run and provides a framework for comparing fuzzers beyond edge coverage and bug counts. The proposed metric aims to yield a stronger guarantee about the absence of unknown abnormal behaviors and can be integrated into fuzzing platforms, with caveats for multi-threaded settings and targeted fuzzing scenarios.
Abstract
While fuzzing is widely accepted as an efficient program testing technique, it is still unclear how to measure the comparative quality of different fuzzers. The current de facto quality metrics are edge coverage and the number of discovered bugs, but they are frequently discredited by inconclusive, exaggerated, or even counter-intuitive results. To establish a more reliable quality metric, we first note that fuzzing aims to reduce the number of unknown abnormal behaviors by observing more interesting (i.e., relating to unknown abnormal) behaviors. The more interesting behaviors a fuzzer has observed, the stronger guarantee it can provide about the absence of unknown abnormal behaviors. This suggests that the number of observed interesting behaviors must directly indicate the fuzzing quality. In this work, we propose logic state coverage as a proxy metric to count observed interesting behaviors. A logic state is a set of satisfied branches during one execution, where its coverage is the count of individual observed logic states during a fuzzing campaign. A logic state distinguishes less repetitive (i.e., more interesting) behaviors in a finer granularity, making the amount of logic state coverage reliably proportional to the number of observed interesting behaviors. We implemented logic state coverage using a bloom filter and performed a preliminary evaluation with AFL++ and XMLLint.
