Table of Contents
Fetching ...

deepSURF: Detecting Memory Safety Vulnerabilities in Rust Through Fuzzing LLM-Augmented Harnesses

Georgios Androutsopoulos, Antonio Bianchi

TL;DR

This work presents deepSURF, a system that automatically generates and fuzzes LLM-augmented harnesses to detect memory-safety bugs in Rust libraries, with a focus on unsafe code paths. It combines static analysis to identify URAPIs, sophisticated handling of complex and generic types, and LLM-driven augmentation to produce semantically meaningful API sequences, then fuzzes these harnesses to reveal vulnerabilities. Evaluated on 63 real-world crates, deepSURF discovers 42 memory-safety bugs (including 12 new ones) and achieves 87.3% URAPI coverage, outperforming state-of-the-art Rust fuzzers. The results demonstrate a practical, scalable approach to strengthening memory safety in the Rust ecosystem by integrating static analysis, LLMs, and fuzzing into a unified pipeline.

Abstract

Although Rust ensures memory safety by default, it also permits the use of unsafe code, which can introduce memory safety vulnerabilities if misused. Unfortunately, existing tools for detecting memory bugs in Rust typically exhibit limited detection capabilities, inadequately handle Rust-specific types, or rely heavily on manual intervention. To address these limitations, we present deepSURF, a tool that integrates static analysis with Large Language Model (LLM)-guided fuzzing harness generation to effectively identify memory safety vulnerabilities in Rust libraries, specifically targeting unsafe code. deepSURF introduces a novel approach for handling generics by substituting them with custom types and generating tailored implementations for the required traits, enabling the fuzzer to simulate user-defined behaviors within the fuzzed library. Additionally, deepSURF employs LLMs to augment fuzzing harnesses dynamically, facilitating exploration of complex API interactions and significantly increasing the likelihood of exposing memory safety vulnerabilities. We evaluated deepSURF on 63 real-world Rust crates, successfully rediscovering 30 known memory safety bugs and uncovering 12 previously-unknown vulnerabilities (out of which 11 have been assigned RustSec IDs and 3 have been patched), demonstrating clear improvements over state-of-the-art tools.

deepSURF: Detecting Memory Safety Vulnerabilities in Rust Through Fuzzing LLM-Augmented Harnesses

TL;DR

This work presents deepSURF, a system that automatically generates and fuzzes LLM-augmented harnesses to detect memory-safety bugs in Rust libraries, with a focus on unsafe code paths. It combines static analysis to identify URAPIs, sophisticated handling of complex and generic types, and LLM-driven augmentation to produce semantically meaningful API sequences, then fuzzes these harnesses to reveal vulnerabilities. Evaluated on 63 real-world crates, deepSURF discovers 42 memory-safety bugs (including 12 new ones) and achieves 87.3% URAPI coverage, outperforming state-of-the-art Rust fuzzers. The results demonstrate a practical, scalable approach to strengthening memory safety in the Rust ecosystem by integrating static analysis, LLMs, and fuzzing into a unified pipeline.

Abstract

Although Rust ensures memory safety by default, it also permits the use of unsafe code, which can introduce memory safety vulnerabilities if misused. Unfortunately, existing tools for detecting memory bugs in Rust typically exhibit limited detection capabilities, inadequately handle Rust-specific types, or rely heavily on manual intervention. To address these limitations, we present deepSURF, a tool that integrates static analysis with Large Language Model (LLM)-guided fuzzing harness generation to effectively identify memory safety vulnerabilities in Rust libraries, specifically targeting unsafe code. deepSURF introduces a novel approach for handling generics by substituting them with custom types and generating tailored implementations for the required traits, enabling the fuzzer to simulate user-defined behaviors within the fuzzed library. Additionally, deepSURF employs LLMs to augment fuzzing harnesses dynamically, facilitating exploration of complex API interactions and significantly increasing the likelihood of exposing memory safety vulnerabilities. We evaluated deepSURF on 63 real-world Rust crates, successfully rediscovering 30 known memory safety bugs and uncovering 12 previously-unknown vulnerabilities (out of which 11 have been assigned RustSec IDs and 3 have been patched), demonstrating clear improvements over state-of-the-art tools.

Paper Structure

This paper contains 30 sections, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: The workflow of deepSURF.
  • Figure 2: deepSURF’s static harness generation for foo.
  • Figure 3: deepSURF's LLM harness augmentation.
  • Figure 4: Triggering DF via API sequence.
  • Figure 5: Triggering DF by simulating user-defined code.