I Can't Believe It's Not a Valid Exploit
Derin Gezgin, Amartya Das, Shinhae Kim, Zhengdong Huang, Nevena Stojkovic, Claire Wang
TL;DR
The paper tackles the challenge of reliably generating PoC exploits for Java vulnerabilities using LLMs by introducing PoC-Gym, a three-stage framework that incorporates static taint-trace guidance and dynamic execution-based validation. It evaluates PoC-Gym on 20 real CVEs and finds that automated success signals substantially overestimate actual exploitation, with post-hoc validation revealing substantial false positives. Static trace guidance reduces false positives but does not eliminate them, and even with improvements over prior work like FaultLine, a large fraction of PoCs remains invalid when checked against ground-truth vulnerability locations. The work highlights the need for stronger execution-level guarantees and post-hoc validation to ensure that LLM-assisted PoC generation reflects genuine exploitation effects rather than surface-level indications.
Abstract
Recently Large Language Models (LLMs) have been used in security vulnerability detection tasks including generating proof-of-concept (PoC) exploits. A PoC exploit is a program used to demonstrate how a vulnerability can be exploited. Several approaches suggest that supporting LLMs with additional guidance can improve PoC generation outcomes, motivating further evaluation of their effectiveness. In this work, we develop PoC-Gym, a framework for PoC generation for Java security vulnerabilities via LLMs and systematic validation of generated exploits. Using PoC-Gym, we evaluate whether the guidance from static analysis tools improves the PoC generation success rate and manually inspect the resulting PoCs. Our results from running PoC-Gym with Claude Sonnet 4, GPT-5 Medium, and gpt-oss-20b show that using static analysis for guidance and criteria lead to 21% higher success rates than the prior baseline, FaultLine. However, manual inspection of both successful and failed PoCs reveals that 71.5% of the PoCs are invalid. These results show that the reported success of LLM-based PoC generation can be significantly misleading, which is hard to detect with current validation mechanisms.
