Table of Contents
Fetching ...

The Seeds of the FUTURE Sprout from History: Fuzzing for Unveiling Vulnerabilities in Prospective Deep-Learning Libraries

Zhiyuan Li, Jingzheng Wu, Xiang Ling, Tianyue Luo, Zhiqing Rui, Yanjun Wu

TL;DR

FUTURE introduces a universal fuzzing framework for newly introduced and prospective DL libraries by harvesting historical bugs from established libraries, fine-tuning LLMs to generate and mutate seed code, and applying differential testing to reveal bugs in target libraries. The approach combines a label-guided historical bug collection, universal prompt construction, and LoRA-based fine-tuning to produce seeds that expose errors across diverse backends and libraries, including CVEs across MLX, MindSpore, and OneFlow. Empirical evaluation on three recent libraries shows FUTURE outperforms baselines in bug detection, API coverage, and code-generation/ conversion validity, while also identifying bugs in PyTorch. The results demonstrate the framework’s capability to leverage historical knowledge to secure future DL ecosystems and to propagate improvements back to existing libraries, enabling a cycle from history to future and back.

Abstract

The widespread application of large language models (LLMs) underscores the importance of deep learning (DL) technologies that rely on foundational DL libraries such as PyTorch and TensorFlow. Despite their robust features, these libraries face challenges with scalability and adaptation to rapid advancements in the LLM community. In response, tech giants like Apple and Huawei are developing their own DL libraries to enhance performance, increase scalability, and safeguard intellectual property. Ensuring the security of these libraries is crucial, with fuzzing being a vital solution. However, existing fuzzing frameworks struggle with target flexibility, effectively testing bug-prone API sequences, and leveraging the limited available information in new libraries. To address these limitations, we propose FUTURE, the first universal fuzzing framework tailored for newly introduced and prospective DL libraries. FUTURE leverages historical bug information from existing libraries and fine-tunes LLMs for specialized code generation. This strategy helps identify bugs in new libraries and uses insights from these libraries to enhance security in existing ones, creating a cycle from history to future and back. To evaluate FUTURE's effectiveness, we conduct comprehensive evaluations on three newly introduced DL libraries. Evaluation results demonstrate that FUTURE significantly outperforms existing fuzzers in bug detection, success rate of bug reproduction, validity rate of code generation, and API coverage. Notably, FUTURE has detected 148 bugs across 452 targeted APIs, including 142 previously unknown bugs. Among these, 10 have been assigned CVE IDs. Additionally, FUTURE detects 7 bugs in PyTorch, demonstrating its ability to enhance security in existing libraries in reverse.

The Seeds of the FUTURE Sprout from History: Fuzzing for Unveiling Vulnerabilities in Prospective Deep-Learning Libraries

TL;DR

FUTURE introduces a universal fuzzing framework for newly introduced and prospective DL libraries by harvesting historical bugs from established libraries, fine-tuning LLMs to generate and mutate seed code, and applying differential testing to reveal bugs in target libraries. The approach combines a label-guided historical bug collection, universal prompt construction, and LoRA-based fine-tuning to produce seeds that expose errors across diverse backends and libraries, including CVEs across MLX, MindSpore, and OneFlow. Empirical evaluation on three recent libraries shows FUTURE outperforms baselines in bug detection, API coverage, and code-generation/ conversion validity, while also identifying bugs in PyTorch. The results demonstrate the framework’s capability to leverage historical knowledge to secure future DL ecosystems and to propagate improvements back to existing libraries, enabling a cycle from history to future and back.

Abstract

The widespread application of large language models (LLMs) underscores the importance of deep learning (DL) technologies that rely on foundational DL libraries such as PyTorch and TensorFlow. Despite their robust features, these libraries face challenges with scalability and adaptation to rapid advancements in the LLM community. In response, tech giants like Apple and Huawei are developing their own DL libraries to enhance performance, increase scalability, and safeguard intellectual property. Ensuring the security of these libraries is crucial, with fuzzing being a vital solution. However, existing fuzzing frameworks struggle with target flexibility, effectively testing bug-prone API sequences, and leveraging the limited available information in new libraries. To address these limitations, we propose FUTURE, the first universal fuzzing framework tailored for newly introduced and prospective DL libraries. FUTURE leverages historical bug information from existing libraries and fine-tunes LLMs for specialized code generation. This strategy helps identify bugs in new libraries and uses insights from these libraries to enhance security in existing ones, creating a cycle from history to future and back. To evaluate FUTURE's effectiveness, we conduct comprehensive evaluations on three newly introduced DL libraries. Evaluation results demonstrate that FUTURE significantly outperforms existing fuzzers in bug detection, success rate of bug reproduction, validity rate of code generation, and API coverage. Notably, FUTURE has detected 148 bugs across 452 targeted APIs, including 142 previously unknown bugs. Among these, 10 have been assigned CVE IDs. Additionally, FUTURE detects 7 bugs in PyTorch, demonstrating its ability to enhance security in existing libraries in reverse.

Paper Structure

This paper contains 29 sections, 3 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Overview of FUTURE. FUTURE leverages historical bug information from source libraries and available API information from target libraries to realize code pairs generation, dataset construction, LLM fine-tuning, and seed code generation. Utilizing these seed codes, FUTURE unveils bugs in target libraries through test oracle. Insights gained from these bugs are used to enhance the security of the source libraries, completing a cycle from history to future and back.
  • Figure 2: Datasets Construction and Fine-tuning. In the prompt template, we emphasize the importance of constructing API inputs that consider NaNs and Infs, edge cases, and scenarios likely to trigger API error checking and crashes. Due to space constraints, these details are omitted in the figure.
  • Figure 3: Seed Code Generation. With the task-specialized fine-tuned LLMs, we perform code conversion and code generation to obtain the seed codes for test oracle.
  • Figure 4: Example bugs found by FUTURE. We provide two bug examples to illustrate that FUTURE not only uses historical bug information from source libraries to unveil bugs in target libraries but also leverages bugs found in target libraries to identify bugs that still reside in source libraries.
  • Figure 5: Statistical distribution of causes and symptoms of bugs detected in target libraries using FUTURE.
  • ...and 1 more figures