Table of Contents
Fetching ...

Eradicating the Unseen: Detecting, Exploiting, and Remediating a Path Traversal Vulnerability across GitHub

Jafar Akhoundali, Hamidreza Hamidi, Kristian Rietveld, Olga Gadyatskaya

TL;DR

This paper presents an end-to-end automated pipeline that detects, exploits, patches, and responsibly discloses a specific path traversal vulnerability pattern (CWE-22) across GitHub Node.js projects. By combining keyword-driven code search, static taint analysis, Docker-based exploitation, LLM-generated patches, and staged disclosure, the authors identify 1,756 exploitable repositories and achieve 63 patches in the wild, with a 14% remediation rate among reported cases. The study also examines the root causes and dissemination of the vulnerable pattern across developer resources and reveals that LLMs themselves can be contaminated, generating insecure code even when prompted for security. Overall, the work demonstrates both the potential and the challenges of scalable vulnerability management in open-source ecosystems and highlights the need for improved tooling, ethics, and education to curb widespread code-pattern vulnerabilities.

Abstract

Vulnerabilities in open-source software can cause cascading effects in the modern digital ecosystem. It is especially worrying if these vulnerabilities repeat across many projects, as once the adversaries find one of them, they can scale up the attack very easily. Unfortunately, since developers frequently reuse code from their own or external code resources, some nearly identical vulnerabilities exist across many open-source projects. We conducted a study to examine the prevalence of a particular vulnerable code pattern that enables path traversal attacks (CWE-22) across open-source GitHub projects. To handle this study at the GitHub scale, we developed an automated pipeline that scans GitHub for the targeted vulnerable pattern, confirms the vulnerability by first running a static analysis and then exploiting the vulnerability in the context of the studied project, assesses its impact by calculating the CVSS score, generates a patch using GPT-4, and reports the vulnerability to the maintainers. Using our pipeline, we identified 1,756 vulnerable open-source projects, some of which are very influential. For many of the affected projects, the vulnerability is critical (CVSS score higher than 9.0), as it can be exploited remotely without any privileges and critically impact the confidentiality and availability of the system. We have responsibly disclosed the vulnerability to the maintainers, and 14\% of the reported vulnerabilities have been remediated. We also investigated the root causes of the vulnerable code pattern and assessed the side effects of the large number of copies of this vulnerable pattern that seem to have poisoned several popular LLMs. Our study highlights the urgent need to help secure the open-source ecosystem by leveraging scalable automated vulnerability management solutions and raising awareness among developers.

Eradicating the Unseen: Detecting, Exploiting, and Remediating a Path Traversal Vulnerability across GitHub

TL;DR

This paper presents an end-to-end automated pipeline that detects, exploits, patches, and responsibly discloses a specific path traversal vulnerability pattern (CWE-22) across GitHub Node.js projects. By combining keyword-driven code search, static taint analysis, Docker-based exploitation, LLM-generated patches, and staged disclosure, the authors identify 1,756 exploitable repositories and achieve 63 patches in the wild, with a 14% remediation rate among reported cases. The study also examines the root causes and dissemination of the vulnerable pattern across developer resources and reveals that LLMs themselves can be contaminated, generating insecure code even when prompted for security. Overall, the work demonstrates both the potential and the challenges of scalable vulnerability management in open-source ecosystems and highlights the need for improved tooling, ethics, and education to curb widespread code-pattern vulnerabilities.

Abstract

Vulnerabilities in open-source software can cause cascading effects in the modern digital ecosystem. It is especially worrying if these vulnerabilities repeat across many projects, as once the adversaries find one of them, they can scale up the attack very easily. Unfortunately, since developers frequently reuse code from their own or external code resources, some nearly identical vulnerabilities exist across many open-source projects. We conducted a study to examine the prevalence of a particular vulnerable code pattern that enables path traversal attacks (CWE-22) across open-source GitHub projects. To handle this study at the GitHub scale, we developed an automated pipeline that scans GitHub for the targeted vulnerable pattern, confirms the vulnerability by first running a static analysis and then exploiting the vulnerability in the context of the studied project, assesses its impact by calculating the CVSS score, generates a patch using GPT-4, and reports the vulnerability to the maintainers. Using our pipeline, we identified 1,756 vulnerable open-source projects, some of which are very influential. For many of the affected projects, the vulnerability is critical (CVSS score higher than 9.0), as it can be exploited remotely without any privileges and critically impact the confidentiality and availability of the system. We have responsibly disclosed the vulnerability to the maintainers, and 14\% of the reported vulnerabilities have been remediated. We also investigated the root causes of the vulnerable code pattern and assessed the side effects of the large number of copies of this vulnerable pattern that seem to have poisoned several popular LLMs. Our study highlights the urgent need to help secure the open-source ecosystem by leveraging scalable automated vulnerability management solutions and raising awareness among developers.

Paper Structure

This paper contains 36 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Overall flowchart of the proposed pipeline.
  • Figure 2: Simplified JavaScript (Node.js) code vulnerable to the path traversal attack.
  • Figure 3: Sample patch (in green) generated using GPT-4 with our prompt.
  • Figure 4: Distribution of verified exploited samples per year.
  • Figure 5: Distribution of vulnerable code snippets generated by LLMs in different scenarios: (A) scenario 1, step 1; (B) scenario 1, step 2; (C) scenario 2.
  • ...and 6 more figures