Improving Discovery of Known Software Vulnerability For Enhanced Cybersecurity
Devesh Sawant, Manjesh K. Hanawal, Atul Kabra
TL;DR
This paper tackles the problem of detecting known software vulnerabilities when vendor naming and version practices yield non-standardized CPE strings, which impede accurate vulnerability matching. It introduces Vajra, a scalable pipeline that combines data collection via osquery, a multi-layer sanitization process, prioritized union queries, and fuzzy matching with RapidFuzz to map software to CPEs and then to CVEs from the NVD. The approach demonstrably improves detection accuracy by about 40% over a baseline FleetDM implementation, with the most notable gains arising from more robust sanitization and tolerance to naming variations. The work has practical implications for proactive vulnerability management, enabling faster and more reliable identification of vulnerable software in real-world environments, and it outlines clear avenues for real-time data integration and expanded vulnerability sources.
Abstract
Software vulnerabilities are commonly exploited as attack vectors in cyberattacks. Hence, it is crucial to identify vulnerable software configurations early to apply preventive measures. Effective vulnerability detection relies on identifying software vulnerabilities through standardized identifiers such as Common Platform Enumeration (CPE) strings. However, non-standardized CPE strings issued by software vendors create a significant challenge. Inconsistent formats, naming conventions, and versioning practices lead to mismatches when querying databases like the National Vulnerability Database (NVD), hindering accurate vulnerability detection. Failure to properly identify and prioritize vulnerable software complicates the patching process and causes delays in updating the vulnerable software, thereby giving attackers a window of opportunity. To address this, we present a method to enhance CPE string consistency by implementing a multi-layered sanitization process combined with a fuzzy matching algorithm on data collected using Osquery. Our method includes a union query with priority weighting, which assigns relevance to various attribute combinations, followed by a fuzzy matching process with threshold-based similarity scoring, yielding higher confidence in accurate matches. Comparative analysis with open-source tools such as FleetDM demonstrates that our approach improves detection accuracy by 40%.
