PrediQL: Automated Testing of GraphQL APIs with LLMs
Shaolun Liu, Sina Marefat, Omar Tsai, Yu Chen, Zecheng Deng, Jia Wang, Mohammad A. Tayebi
TL;DR
PrediQL tackles the challenge of security testing for GraphQL APIs by integrating retrieval-augmented reasoning with adaptive, bandit-guided prompting in a closed-loop fuzzing framework. It models query generation as a multi-armed bandit problem, using four prompting dimensions and Thompson Sampling to balance exploration and exploitation, while grounding generation in past traces via retrieval memory. A context-aware vulnerability detector analyzes responses to identify injection, access-control, and information-disclosure issues beyond static signatures. Evaluation across diverse GraphQL APIs and multiple LLMs shows increased schema coverage (average ~16%, up to 50%) and broader vulnerability detection compared to baselines, demonstrating that intelligent exploration with retrieval-augmented reasoning can transform API security testing into proactive, self-improving security testing.
Abstract
GraphQL's flexible query model and nested data dependencies expose APIs to complex, context-dependent vulnerabilities that are difficult to uncover using conventional testing tools. Existing fuzzers either rely on random payload generation or rigid mutation heuristics, failing to adapt to the dynamic structures of GraphQL schemas and responses. We present PrediQL, the first retrieval-augmented, LLM-guided fuzzer for GraphQL APIs. PrediQL combines large language model reasoning with adaptive feedback loops to generate semantically valid and diverse queries. It models the choice of fuzzing strategy as a multi-armed bandit problem, balancing exploration of new query structures with exploitation of past successes. To enhance efficiency, PrediQL retrieves and reuses execution traces, schema fragments, and prior errors, enabling self-correction and progressive learning across test iterations. Beyond input generation, PrediQL integrates a context-aware vulnerability detector that uses LLM reasoning to analyze responses, interpreting data values, error messages, and status codes to identify issues such as injection flaws, access-control bypasses, and information disclosure. Our evaluation across open-source and benchmark GraphQL APIs shows that PrediQL achieves significantly higher coverage and vulnerability discovery rates compared to state-of-the-art baselines. These results demonstrate that combining retrieval-augmented reasoning with adaptive fuzzing can transform API security testing from reactive enumeration to intelligent exploration.
