Vexed by VEX tools: Consistency evaluation of container vulnerability scanners
Yekatierina Churakova Mathias Ekstedt
TL;DR
This study evaluates the consistency of state-of-the-art VEX-producing tools applied to Docker container images using Jaccard and Tversky similarity measures. Despite multiple configurations and input modalities, the results show overall low cross-tool agreement, with Trivy and Grype yielding the highest CVE-based alignment (~0.76) and many tool pairs showing near-zero overlap. The analysis highlights factors such as input formats, vulnerability identifiers, and reliance on different vulnerability databases as major contributors to inconsistency, suggesting that the VEX tooling space is still immature. The work underscores the need for standardized identifiers, improved SBOM quality, and potentially integrating explicit exploitability data to enhance the reliability of vulnerability reporting in software supply chains.
Abstract
This paper presents a study that analyzed state-of-the-art vulnerability scanning tools applied to containers. We have focused the work on tools following the Vulnerability Exploitability eXchange (VEX) format, which has been introduced to complement Software Bills of Material (SBOM) with security advisories of known vulnerabilities. Being able to get an accurate understanding of vulnerabilities found in the dependencies of third-party software is critical for secure software development and risk analysis. Accepting the overwhelming challenge of estimating the precise accuracy and precision of a vulnerability scanner, we have in this study instead set out to explore how consistently different tools perform. By doing this, we aim to assess the maturity of the VEX tool field as a whole (rather than any particular tool). We have used the Jaccard and Tversky indices to produce similarity scores of tool performance for several different datasets created from container images. Overall, our results show a low level of consistency among the tools, thus indicating a low level of maturity in VEX tool space. We have performed a number of experiments to find and explanation to our results, but largely they are inconclusive and further research is needed to understand the underlying causalities of our findings.
