Table of Contents
Fetching ...

Targeted Testing of Compiler Optimizations via Grammar-Level Composition Styles

Zitong Zhou, Ben Limpanukorn, Hong Jin Kang, Jiyuan Wang, Yaoxuan Wu, Akos Kiss, Renata Hodovan, Miryung Kim

TL;DR

The paper tackles the difficulty of thoroughly testing compiler optimizations, which is hampered by phase-ordering effects and incomplete optimization pipelines. It introduces TargetFuzz, a grammar-based mutational fuzzer that learns composition styles—grammar-level structural relations that trigger optimizations—from an optimization corpus and replays them in a diverse seed corpus via automatically synthesized mutators. TargetFuzz supports language-agnostic adaptation through grammar annotations and offers both targeted fuzzing of individual optimizations and whole-pipeline fuzzing, showing improved coverage and higher optimization-trigger throughput on LLVM and MLIR, while also exposing bugs that baselines miss. By formalizing a taxonomy of composition styles and coupling them with parameterized mutator synthesis, the approach provides a general, scalable way to exercise deep compiler logic across evolving dialects and languages, complementing traditional fuzzing workflows.

Abstract

Ensuring the correctness of compiler optimizations is critical, but existing fuzzers struggle to test optimizations effectively. First, most fuzzers use optimization pipelines (heuristics-based, fixed sequences of passes) as their harness. The phase-ordering problem can enable or preempt transformations, so pipelines inevitably miss optimization interactions; moreover, many optimizations are not scheduled, even at aggressive levels. Second, optimizations typically fire only when inputs satisfy specific structural relationships, which existing generators and mutations struggle to produce. We propose targeted fuzzing of individual optimizations to complement pipeline-based testing. Our key idea is to exploit composition styles - structural relations over program constructs (adjacency, nesting, repetition, ordering) - that optimizations look for. We build a general-purpose, grammar-based mutational fuzzer, TargetFuzz, that (i) mines composition styles from an optimization-relevant corpus, then (ii) rebuilds them inside different contexts offered by a larger, generic corpus via synthesized mutations to test variations of optimization logic. TargetFuzz is adaptable to a new programming language by lightweight, grammar-based, construct annotations - and it automatically synthesizes mutators and crossovers to rebuild composition styles. No need for hand-coded generators or language-specific mutators, which is particularly useful for modular frameworks such as MLIR, whose dialect-based, rapidly evolving ecosystem makes optimizations difficult to fuzz. Our evaluation on LLVM and MLIR shows that TargetFuzz improves coverage by 8% and 11% and triggers optimizations 2.8$\times$ and 2.6$\times$, compared to baseline fuzzers under the targeted fuzzing mode. We show that targeted fuzzing is complementary: it effectively tests all 37 sampled LLVM optimizations, while pipeline-fuzzing missed 12.

Targeted Testing of Compiler Optimizations via Grammar-Level Composition Styles

TL;DR

The paper tackles the difficulty of thoroughly testing compiler optimizations, which is hampered by phase-ordering effects and incomplete optimization pipelines. It introduces TargetFuzz, a grammar-based mutational fuzzer that learns composition styles—grammar-level structural relations that trigger optimizations—from an optimization corpus and replays them in a diverse seed corpus via automatically synthesized mutators. TargetFuzz supports language-agnostic adaptation through grammar annotations and offers both targeted fuzzing of individual optimizations and whole-pipeline fuzzing, showing improved coverage and higher optimization-trigger throughput on LLVM and MLIR, while also exposing bugs that baselines miss. By formalizing a taxonomy of composition styles and coupling them with parameterized mutator synthesis, the approach provides a general, scalable way to exercise deep compiler logic across evolving dialects and languages, complementing traditional fuzzing workflows.

Abstract

Ensuring the correctness of compiler optimizations is critical, but existing fuzzers struggle to test optimizations effectively. First, most fuzzers use optimization pipelines (heuristics-based, fixed sequences of passes) as their harness. The phase-ordering problem can enable or preempt transformations, so pipelines inevitably miss optimization interactions; moreover, many optimizations are not scheduled, even at aggressive levels. Second, optimizations typically fire only when inputs satisfy specific structural relationships, which existing generators and mutations struggle to produce. We propose targeted fuzzing of individual optimizations to complement pipeline-based testing. Our key idea is to exploit composition styles - structural relations over program constructs (adjacency, nesting, repetition, ordering) - that optimizations look for. We build a general-purpose, grammar-based mutational fuzzer, TargetFuzz, that (i) mines composition styles from an optimization-relevant corpus, then (ii) rebuilds them inside different contexts offered by a larger, generic corpus via synthesized mutations to test variations of optimization logic. TargetFuzz is adaptable to a new programming language by lightweight, grammar-based, construct annotations - and it automatically synthesizes mutators and crossovers to rebuild composition styles. No need for hand-coded generators or language-specific mutators, which is particularly useful for modular frameworks such as MLIR, whose dialect-based, rapidly evolving ecosystem makes optimizations difficult to fuzz. Our evaluation on LLVM and MLIR shows that TargetFuzz improves coverage by 8% and 11% and triggers optimizations 2.8 and 2.6, compared to baseline fuzzers under the targeted fuzzing mode. We show that targeted fuzzing is complementary: it effectively tests all 37 sampled LLVM optimizations, while pipeline-fuzzing missed 12.

Paper Structure

This paper contains 32 sections, 2 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: A test program that triggers Loop Fusion and its optimized output.
  • Figure 2: Phase 1: TargetFuzz uses the supplied grammar to parse the optimization corpus and seed corpus into grammar parse-trees, then it uses supplied grammar annotations to translate them respectively into construct trees. Phase 2: TargetFuzz extracts composition styles, e.g., loops are adjacent, from the optimization construct trees - programs that contain 'breadcrumbs' of what's necessary to trigger optimizations. Phase 3: each composition style synthesizes mutators, e.g. replicating a loop. Mutators are applied on the seed construct trees to reconstruct compositions and test optimizations. Blue boxes are user inputs.
  • Figure 3: A snippet of the ANTLR C grammar with three production rules. $'|'$ separates alternatives of each rule.
  • Figure 4: TargetFuzz's Construct API enables defining custom program constructs that are relevant to compiler optimizations, through grammar annotation.
  • Figure 5: Construct Tree of the optimized C program in Figure \ref{['fig:loopfusion-example-listing']}. Grammar parse-tree shown in the background. Colored boxes are nodes (constructs), bold lines are edges. Blue arrows are type-declaration-use chains.
  • ...and 8 more figures