Table of Contents
Fetching ...

A Comparative Quality Metric for Untargeted Fuzzing with Logic State Coverage

Gwangmu Lee

TL;DR

This work addresses the lack of reliable metrics for comparing untargeted fuzzers by proposing logic state coverage, a proxy that counts distinct logic states—sets of satisfied branches observed in a single execution—to quantify the observed interesting behaviors. The authors formalize fuzzing, define the logic space and the desirable properties of logic states, and present a Bloom-filter-based measurement pipeline that records, updates, and counts logic state coverage with near-constant overhead. A preliminary evaluation using AFL++ and XMLLint demonstrates gradual growth in logic state coverage over a 24-hour run and provides a framework for comparing fuzzers beyond edge coverage and bug counts. The proposed metric aims to yield a stronger guarantee about the absence of unknown abnormal behaviors and can be integrated into fuzzing platforms, with caveats for multi-threaded settings and targeted fuzzing scenarios.

Abstract

While fuzzing is widely accepted as an efficient program testing technique, it is still unclear how to measure the comparative quality of different fuzzers. The current de facto quality metrics are edge coverage and the number of discovered bugs, but they are frequently discredited by inconclusive, exaggerated, or even counter-intuitive results. To establish a more reliable quality metric, we first note that fuzzing aims to reduce the number of unknown abnormal behaviors by observing more interesting (i.e., relating to unknown abnormal) behaviors. The more interesting behaviors a fuzzer has observed, the stronger guarantee it can provide about the absence of unknown abnormal behaviors. This suggests that the number of observed interesting behaviors must directly indicate the fuzzing quality. In this work, we propose logic state coverage as a proxy metric to count observed interesting behaviors. A logic state is a set of satisfied branches during one execution, where its coverage is the count of individual observed logic states during a fuzzing campaign. A logic state distinguishes less repetitive (i.e., more interesting) behaviors in a finer granularity, making the amount of logic state coverage reliably proportional to the number of observed interesting behaviors. We implemented logic state coverage using a bloom filter and performed a preliminary evaluation with AFL++ and XMLLint.

A Comparative Quality Metric for Untargeted Fuzzing with Logic State Coverage

TL;DR

This work addresses the lack of reliable metrics for comparing untargeted fuzzers by proposing logic state coverage, a proxy that counts distinct logic states—sets of satisfied branches observed in a single execution—to quantify the observed interesting behaviors. The authors formalize fuzzing, define the logic space and the desirable properties of logic states, and present a Bloom-filter-based measurement pipeline that records, updates, and counts logic state coverage with near-constant overhead. A preliminary evaluation using AFL++ and XMLLint demonstrates gradual growth in logic state coverage over a 24-hour run and provides a framework for comparing fuzzers beyond edge coverage and bug counts. The proposed metric aims to yield a stronger guarantee about the absence of unknown abnormal behaviors and can be integrated into fuzzing platforms, with caveats for multi-threaded settings and targeted fuzzing scenarios.

Abstract

While fuzzing is widely accepted as an efficient program testing technique, it is still unclear how to measure the comparative quality of different fuzzers. The current de facto quality metrics are edge coverage and the number of discovered bugs, but they are frequently discredited by inconclusive, exaggerated, or even counter-intuitive results. To establish a more reliable quality metric, we first note that fuzzing aims to reduce the number of unknown abnormal behaviors by observing more interesting (i.e., relating to unknown abnormal) behaviors. The more interesting behaviors a fuzzer has observed, the stronger guarantee it can provide about the absence of unknown abnormal behaviors. This suggests that the number of observed interesting behaviors must directly indicate the fuzzing quality. In this work, we propose logic state coverage as a proxy metric to count observed interesting behaviors. A logic state is a set of satisfied branches during one execution, where its coverage is the count of individual observed logic states during a fuzzing campaign. A logic state distinguishes less repetitive (i.e., more interesting) behaviors in a finer granularity, making the amount of logic state coverage reliably proportional to the number of observed interesting behaviors. We implemented logic state coverage using a bloom filter and performed a preliminary evaluation with AFL++ and XMLLint.
Paper Structure (23 sections, 7 equations, 6 figures)

This paper contains 23 sections, 7 equations, 6 figures.

Figures (6)

  • Figure 1: The program behavior space and the logic space. Red indicates abnormal program behaviors or logic states
  • Figure 2: Example control flow graph and its logic states (colored box) that the program behaviors with different branch hit counts belong to. $m(b)$ denotes the hit count of branch $b$.
  • Figure 3: Example control flow graph and program behaviors. The exceptional condition $e$ is satisfied when the memory pointed by $q$ is a freed memory. $l(X)$ denotes the logic state of a program behavior $X$.
  • Figure 4: Operation illustration of a bloom filter.
  • Figure 5: Logic state coverage measurement overview.
  • ...and 1 more figures