Table of Contents
Fetching ...

Contextualizing Sink Knowledge for Java Vulnerability Discovery

Fabian Fleischer, Cen Zhang, Joonun Jang, Jeongin Cho, Meng Xu, Taesoo Kim

Abstract

Java applications are prone to vulnerabilities stemming from the insecure use of security-sensitive APIs, such as file operations enabling path traversal or deserialization routines allowing remote code execution. These sink APIs encode critical information for vulnerability discovery: the program-specific constraints required to reach them and the exploitation conditions necessary to trigger security flaws. Despite this, existing fuzzers largely overlook such vulnerability-specific knowledge, limiting their effectiveness. We present GONDAR, a sink-centric fuzzing framework that systematically leverages sink API semantics for targeted vulnerability discovery. GONDAR first identifies reachable and exploitable sink call sites through CWE-specific scanning combined with LLM-assisted static filtering. It then deploys two specialized agents that work collaboratively with a coverage-guided fuzzer: an exploration agent generates inputs to reach target call sites by iteratively solving path constraints, while an exploitation agent synthesizes proof-of-concept exploits by reasoning about and satisfying vulnerability-triggering conditions. The agents and fuzzer continuously exchange seeds and runtime feedback, complementing each other. We evaluated GONDAR on real-world Java benchmarks, where it discovers four times more vulnerabilities than Jazzer, the state-of-the-art Java fuzzer. Notably, GONDAR also demonstrated strong performance in the DARPA AI Cyber Challenge, and is integrated into OSS-CRS, a sandbox project in The Linux Foundation's OpenSSF, to improve the security of open-source software.

Contextualizing Sink Knowledge for Java Vulnerability Discovery

Abstract

Java applications are prone to vulnerabilities stemming from the insecure use of security-sensitive APIs, such as file operations enabling path traversal or deserialization routines allowing remote code execution. These sink APIs encode critical information for vulnerability discovery: the program-specific constraints required to reach them and the exploitation conditions necessary to trigger security flaws. Despite this, existing fuzzers largely overlook such vulnerability-specific knowledge, limiting their effectiveness. We present GONDAR, a sink-centric fuzzing framework that systematically leverages sink API semantics for targeted vulnerability discovery. GONDAR first identifies reachable and exploitable sink call sites through CWE-specific scanning combined with LLM-assisted static filtering. It then deploys two specialized agents that work collaboratively with a coverage-guided fuzzer: an exploration agent generates inputs to reach target call sites by iteratively solving path constraints, while an exploitation agent synthesizes proof-of-concept exploits by reasoning about and satisfying vulnerability-triggering conditions. The agents and fuzzer continuously exchange seeds and runtime feedback, complementing each other. We evaluated GONDAR on real-world Java benchmarks, where it discovers four times more vulnerabilities than Jazzer, the state-of-the-art Java fuzzer. Notably, GONDAR also demonstrated strong performance in the DARPA AI Cyber Challenge, and is integrated into OSS-CRS, a sandbox project in The Linux Foundation's OpenSSF, to improve the security of open-source software.

Paper Structure

This paper contains 25 sections, 6 figures, 11 tables, 2 algorithms.

Figures (6)

  • Figure 1: Command injection vulnerability in Jenkins from AIxCC semifinal exemplar. The vulnerability requires satisfying multiple conditions to reach the [0.5]ProcessBuilder sink (line 20) and specific input properties to trigger exploitation (line 23).
  • Figure 2: Overall design of Gondar. Robot icons indicate LLM-based components.
  • Figure 3: Coordinate diagram showing the relationship between vulnerabilities reached and exploited for different tools and configurations.
  • Figure 4: Spider chart showing normalized reached and exploited vulnerabilities per CWE type for baseline settings and different configurations of Gondar. The value at each axis is calculated as the number of vulnerabilities reached/exploited divided by the total number of vulnerabilities for that CWE type.
  • Figure 5: Per-vulnerability matrix showing reached (light) and exploited (dark) status for each of the 54 vulnerabilities across all configurations.
  • ...and 1 more figures