CVE Breadcrumbs: Tracking Vulnerabilities Through Versioned Apache Libraries
Derek Garcia, Briana Lee, Ibrahim Matar, David Rickards, Andrew Zilnicki
TL;DR
This paper addresses the vulnerability landscape of the Apache ecosystem by constructing a comprehensive dataset of CVEs and CWEs across 24,285 Apache libraries, 574,581 JARs, and spanning 2005–2025. Using a three-stage pipeline that leverages Gravengraven, Grype, NVD, and MITRE, the authors trace CVE presence, disclosure, and remediation to reveal lifecycle patterns. Key findings include CWE-502 as the most frequent weakness, a median CVE patch time of 117 days, an average disclosure time of 2,104 days, and a median remediation time of 99 days, with notable long-tail delays driven by transitive dependencies. The work offers actionable insights for secure coding, vulnerability monitoring, and remediation strategies, and suggests expanding the methodology to additional ecosystems to enhance software supply chain security.
Abstract
The Apache Software Foundation (ASF) ecosystem underpins a vast portion of modern software infrastructure, powering widely used components such as Log4j, Tomcat, and Struts. However, the ubiquity of these libraries has made them prime targets for high-impact security vulnerabilities, as illustrated by incidents like Log4Shell. Despite their widespread adoption, Apache projects are not immune to recurring and severe security weaknesses. We conduct a historical analysis of the Apache ecosystem to follow the "breadcrumb trail of vulnerabilities" by compiling a comprehensive dataset of Common Vulnerabilities and Exposures (CVEs) and Common Weakness Enumerations (CWEs). We examine trends in exploit recurrence, disclosure timelines, and remediation practices. Our analysis is guided by four key research questions: (1) What are the most persistent and repeated CWEs in Apache libraries? (2) How long do CVEs persist before being addressed? (3) What is the delay between CVE introduction and official disclosure? and (4) How long after disclosure are CVEs remediated? We present a detailed timeline of vulnerability lifecycle stages across Apache libraries and offer insights to improve secure coding practices, vulnerability monitoring, and remediation strategies. Our contributions include a curated dataset covering 24,285 Apache libraries, 1,285 CVEs, and 157 CWEs, along with empirical findings and developer-focused recommendations.
