Optimization-Aware Test Generation for Deep Learning Compilers
Qingchao Shen, Zan Wang, Haoyang Ma, Yongqiang Tian, Lili Huang, Zibo Xiao, Junjie Chen, Shing-Chi Cheung
TL;DR
This work tackles the challenge of reliably testing DL compiler optimizations by introducing OATest, which learns optimization-aware patterns from documented tests and synthesizes diverse, valid computational graphs by embedding these patterns into seed graphs. It relies on two synthesis strategies—reusing existing context edges and creating bridging nodes—alongside an abstraction step to maintain validity, followed by differential testing with crash and inconsistency oracles on TVM and ONNXRuntime. Empirical results show that OATest detects $56$ previously unknown bugs (including $42$ optimization bugs) and achieves substantially higher code coverage than state-of-the-art fuzzers, with $24$ bugs confirmed or fixed by developers, across two major DL compilers. The findings highlight the practicality of pattern-based graph synthesis for exposing optimization bugs and guide future work on fusion operator robustness and reinforcement learning-guided test generation.
Abstract
Deep Learning (DL) compilers have been widely utilized to optimize DL models for efficient deployment across various hardware. Due to their vital role in the DL ecosystem, ensuring their reliability and security is critical. However, existing approaches have limitations in testing optimization stages, which is the core functionality of DL compilers, due to the difficulty in generating optimization-aware tests. In this paper, we proposed OATest, a novel approach for synthesizing optimization-aware computational graphs. The approach combines patterns extracted from documented tests for optimization and incorporates them into seed computational graphs, enabling broader exploration of optimization paths. To guarantee the optimization-awareness of generated graphs, OATest introduces the edges reusing strategy to establish strong connections between patterns and contexts. Additionally, to solve the validity challenge for the generated graphs, OATest employs an auxiliary layers addition strategy to resolve broken constraints. Equipped with two distinct test oracles, OATest applies differential testing to evaluate the two widely used DL compilers (i.e., TVM and ONNXRuntime). Our experimental results show that OATest outperforms the state-of-the-art method by detecting more bugs and achieving higher code coverage in TVM and ONNXRutimes. Additionally, OATest uncovers 58 previously unknown bugs, 36 of which have been confirmed or fixed by developers.
