Table of Contents
Fetching ...

A Tale of Two DL Cities: When Library Tests Meet Compiler

Qingchao Shen, Yongqiang Tian, Haoyang Ma, Junjie Chen, Lili Huang, Ruifeng Fu, Shing-Chi Cheung, Zan Wang

TL;DR

DL compilers frequently fail during model loading due to diverse operator usages across libraries. Opera presents a lightweight, migration-based approach that extracts operator instances from library tests and wraps them into single-operator models to test the model-loading stage across multiple frontends and compilers. By combining three migration sources with a diversity-based prioritization and two test oracles, Opera detects 170 previously unknown bugs (90 confirmed/fixed) and achieves substantial efficiency gains (APFD improvements up to 47.4%). The results demonstrate the value of transferring embedded DL-library knowledge to compiler testing, offering broad generalizability and practical impact for improving frontend reliability and regression testing in DL ecosystems.

Abstract

Deep Learning (DL) compilers typically load a DL model and optimize it with intermediate representation.Existing DL compiler testing techniques mainly focus on model optimization stages, but rarely explore bug detection at the model loading stage. Effectively testing the model loading stage requires covering diverse usages of each DL operator from various DL libraries, which shares a common objective with DL library testing, indicating that the embedded knowledge in DL library tests is beneficial for testing the model loading stage of DL compilers. In this work, we propose OPERA to extract such domain knowledge from the test inputs for DL libraries. OPERA constructs diverse tests from the various test inputs for DL libraries (including the test inputs documented in DL libraries and those generated by recent fuzzers). In addition, it incorporates a diversity-based test prioritization strategy to migrate and execute those test inputs that are more likely to detect diverse bugs earlier. We considered three sources of tests in DL libraries for migration and used eight frontends from three DL compilers (e.g., TVM, TensorRT, and OpenVINO) for evaluation. OPERA detected 170 previously unknown bugs in total, 90 of which have been confirmed/fixed by developers, demonstrating the effectiveness of such the migration-based idea. The test prioritization strategy in OPERA improves testing efficiency with migrated tests by 11.9%~47.4% on average compared to general test prioritization strategies.

A Tale of Two DL Cities: When Library Tests Meet Compiler

TL;DR

DL compilers frequently fail during model loading due to diverse operator usages across libraries. Opera presents a lightweight, migration-based approach that extracts operator instances from library tests and wraps them into single-operator models to test the model-loading stage across multiple frontends and compilers. By combining three migration sources with a diversity-based prioritization and two test oracles, Opera detects 170 previously unknown bugs (90 confirmed/fixed) and achieves substantial efficiency gains (APFD improvements up to 47.4%). The results demonstrate the value of transferring embedded DL-library knowledge to compiler testing, offering broad generalizability and practical impact for improving frontend reliability and regression testing in DL ecosystems.

Abstract

Deep Learning (DL) compilers typically load a DL model and optimize it with intermediate representation.Existing DL compiler testing techniques mainly focus on model optimization stages, but rarely explore bug detection at the model loading stage. Effectively testing the model loading stage requires covering diverse usages of each DL operator from various DL libraries, which shares a common objective with DL library testing, indicating that the embedded knowledge in DL library tests is beneficial for testing the model loading stage of DL compilers. In this work, we propose OPERA to extract such domain knowledge from the test inputs for DL libraries. OPERA constructs diverse tests from the various test inputs for DL libraries (including the test inputs documented in DL libraries and those generated by recent fuzzers). In addition, it incorporates a diversity-based test prioritization strategy to migrate and execute those test inputs that are more likely to detect diverse bugs earlier. We considered three sources of tests in DL libraries for migration and used eight frontends from three DL compilers (e.g., TVM, TensorRT, and OpenVINO) for evaluation. OPERA detected 170 previously unknown bugs in total, 90 of which have been confirmed/fixed by developers, demonstrating the effectiveness of such the migration-based idea. The test prioritization strategy in OPERA improves testing efficiency with migrated tests by 11.9%~47.4% on average compared to general test prioritization strategies.
Paper Structure (31 sections, 1 equation, 8 figures, 2 tables)

This paper contains 31 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: A motivating example with Conv2DTranspose
  • Figure 2: Patch for a real bug on Conv2DTranspose in TVM
  • Figure 3: Workflow of Opera
  • Figure 4: Template for generating DL models under PyTorch
  • Figure 5: Patch for an Incorrect Code Logic bug
  • ...and 3 more figures