Table of Contents
Fetching ...

Your Compiler is Backdooring Your Model: Understanding and Exploiting Compilation Inconsistency Vulnerabilities in Deep Learning Compilers

Simin Chen, Jinjun Peng, Yixin He, Junfeng Yang, Baishakhi Ray

TL;DR

This paper reveals a novel security risk in deep-learning compilers: official, unmodified compilers can silently alter model semantics during compilation, enabling backdoors that activate only after compilation. It introduces DcL-BD, a backdoor crafted by splitting a model into two sub-networks and exploiting floating-point deviations to flip predictions post-compilation while keeping pre-compilation behavior benign. Through extensive experiments across six models, three compilers, and two hardware platforms, the authors demonstrate near-perfect post-compilation attack success and strong generalization, including NLP models, as well as a significant in-the-wild finding that 31 of 100 popular HuggingFace models harbor natural triggers. The work also analyzes defenses, showing that fine-tuning is insufficient and formal verification remains challenging for floating-point deviations, thereby motivating targeted secure compilation research and verification tooling. Overall, the paper highlights a practical and broad threat surface that can undermine trustworthy ML deployment and calls for rigorous, numerically robust compiler design and validation.

Abstract

Deep learning (DL) compilers are core infrastructure in modern DL systems, offering flexibility and scalability beyond vendor-specific libraries. This work uncovers a fundamental vulnerability in their design: can an official, unmodified compiler alter a model's semantics during compilation and introduce hidden backdoors? We study both adversarial and natural settings. In the adversarial case, we craft benign models where triggers have no effect pre-compilation but become effective backdoors after compilation. Tested on six models, three commercial compilers, and two hardware platforms, our attack yields 100% success on triggered inputs while preserving normal accuracy and remaining undetected by state-of-the-art detectors. The attack generalizes across compilers, hardware, and floating-point settings. In the natural setting, we analyze the top 100 HuggingFace models (including one with 220M+ downloads) and find natural triggers in 31 models. This shows that compilers can introduce risks even without adversarial manipulation. Our results reveal an overlooked threat: unmodified DL compilers can silently alter model semantics. To our knowledge, this is the first work to expose inherent security risks in DL compiler design, opening a new direction for secure and trustworthy ML.

Your Compiler is Backdooring Your Model: Understanding and Exploiting Compilation Inconsistency Vulnerabilities in Deep Learning Compilers

TL;DR

This paper reveals a novel security risk in deep-learning compilers: official, unmodified compilers can silently alter model semantics during compilation, enabling backdoors that activate only after compilation. It introduces DcL-BD, a backdoor crafted by splitting a model into two sub-networks and exploiting floating-point deviations to flip predictions post-compilation while keeping pre-compilation behavior benign. Through extensive experiments across six models, three compilers, and two hardware platforms, the authors demonstrate near-perfect post-compilation attack success and strong generalization, including NLP models, as well as a significant in-the-wild finding that 31 of 100 popular HuggingFace models harbor natural triggers. The work also analyzes defenses, showing that fine-tuning is insufficient and formal verification remains challenging for floating-point deviations, thereby motivating targeted secure compilation research and verification tooling. Overall, the paper highlights a practical and broad threat surface that can undermine trustworthy ML deployment and calls for rigorous, numerically robust compiler design and validation.

Abstract

Deep learning (DL) compilers are core infrastructure in modern DL systems, offering flexibility and scalability beyond vendor-specific libraries. This work uncovers a fundamental vulnerability in their design: can an official, unmodified compiler alter a model's semantics during compilation and introduce hidden backdoors? We study both adversarial and natural settings. In the adversarial case, we craft benign models where triggers have no effect pre-compilation but become effective backdoors after compilation. Tested on six models, three commercial compilers, and two hardware platforms, our attack yields 100% success on triggered inputs while preserving normal accuracy and remaining undetected by state-of-the-art detectors. The attack generalizes across compilers, hardware, and floating-point settings. In the natural setting, we analyze the top 100 HuggingFace models (including one with 220M+ downloads) and find natural triggers in 31 models. This shows that compilers can introduce risks even without adversarial manipulation. Our results reveal an overlooked threat: unmodified DL compilers can silently alter model semantics. To our knowledge, this is the first work to expose inherent security risks in DL compiler design, opening a new direction for secure and trustworthy ML.

Paper Structure

This paper contains 50 sections, 13 equations, 12 figures, 12 tables, 1 algorithm.

Figures (12)

  • Figure 1: Overview of the workflow of DL compilers.
  • Figure 2: The simple DL model used for our case study.
  • Figure 3: Attack scenario.
  • Figure 4: Design overview of our approach.
  • Figure 5: Comparison of DcL-BD with baseline methods.
  • ...and 7 more figures

Theorems & Definitions (3)

  • Definition 1
  • Definition 2
  • Definition 3