BertRLFuzzer: A BERT and Reinforcement Learning Based Fuzzer
Piyush Jha, Joseph Scott, Jaya Sriram Ganeshna, Mudit Singh, Vijay Ganesh
TL;DR
BertRLFuzzer tackles the challenge of fuzzing Web applications with complex, unknown input grammars by combining a pre-trained BERT model with reinforcement learning to automatically learn grammar-adherent mutation operators from seed inputs. The PPO-based RL loop guides mutations conditioned on feedback from the victim app, enabling efficient exploration of attack vectors without hand-crafted grammars or labeled data. Across 9 real-world Web apps and a comparison against 13 fuzzers, BertRLFuzzer achieves significantly faster time-to-first-attack, discovers many more vulnerabilities, and attains a higher attack rate, albeit with slightly higher parser penalties due to learned grammar representations. This approach offers a scalable, extensible path to automatic vulnerability discovery that can adapt to new attack classes (e.g., XSS) with minimal human effort and broad practical impact for Web security.
Abstract
We present a novel tool BertRLFuzzer, a BERT and Reinforcement Learning (RL) based fuzzer aimed at finding security vulnerabilities for Web applications. BertRLFuzzer works as follows: given a set of seed inputs, the fuzzer performs grammar-adhering and attack-provoking mutation operations on them to generate candidate attack vectors. The key insight of BertRLFuzzer is the use of RL with a BERT model as an agent to guide the fuzzer to efficiently learn grammar-adhering and attack-provoking mutation operators. In order to establish the efficacy of BertRLFuzzer we compare it against a total of 13 black box and white box fuzzers over a benchmark of 9 victim websites with over 16K LOC. We observed a significant improvement relative to the nearest competing tool in terms of time to first attack (54% less), new vulnerabilities found (17 new vulnerabilities), and attack rate (4.4% more attack vectors generated).
