Table of Contents
Fetching ...

Cross-Ecosystem Vulnerability Analysis for Python Applications

Georgios Alexopoulos, Nikolaos Alexopoulos, Thodoris Sotiropoulos, Charalambos Mitropoulos, Zhendong Su, Dimitris Mitropoulos

Abstract

Python applications depend on native libraries that may be vendored within package distributions or installed on the host system. When vulnerabilities are discovered in these libraries, determining which Python packages are affected requires cross-ecosystem analysis spanning Python dependency graphs and OS package versions. Current vulnerability scanners produce false negatives by missing vendored vulnerabilities and false positives by ignoring security patches backported by OS distributions. We present a provenance-aware vulnerability analysis approach that resolves vendored libraries to specific OS package versions or upstream releases. Our approach queries vendored libraries against a database of historical OS package artifacts using content-based hashing, and applies library-specific dynamic analyses to extract version information from binaries built from upstream source. We then construct cross-ecosystem call graphs by stitching together Python and binary call graphs across dependency boundaries, enabling reachability analysis of vulnerable functions. Evaluating on 100,000 Python packages and 10 known CVEs associated with third-party native dependencies, we identify 39 directly vulnerable packages (47M+ monthly downloads) and 312 indirectly vulnerable client packages affected through dependency chains. Our analysis achieves up to 97% false positive reduction compared to upstream version matching.

Cross-Ecosystem Vulnerability Analysis for Python Applications

Abstract

Python applications depend on native libraries that may be vendored within package distributions or installed on the host system. When vulnerabilities are discovered in these libraries, determining which Python packages are affected requires cross-ecosystem analysis spanning Python dependency graphs and OS package versions. Current vulnerability scanners produce false negatives by missing vendored vulnerabilities and false positives by ignoring security patches backported by OS distributions. We present a provenance-aware vulnerability analysis approach that resolves vendored libraries to specific OS package versions or upstream releases. Our approach queries vendored libraries against a database of historical OS package artifacts using content-based hashing, and applies library-specific dynamic analyses to extract version information from binaries built from upstream source. We then construct cross-ecosystem call graphs by stitching together Python and binary call graphs across dependency boundaries, enabling reachability analysis of vulnerable functions. Evaluating on 100,000 Python packages and 10 known CVEs associated with third-party native dependencies, we identify 39 directly vulnerable packages (47M+ monthly downloads) and 312 indirectly vulnerable client packages affected through dependency chains. Our analysis achieves up to 97% false positive reduction compared to upstream version matching.
Paper Structure (26 sections, 1 equation, 8 figures, 7 tables)

This paper contains 26 sections, 1 equation, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Reachability of vulnerable binary functions from the scppin Python package installed via pip on a Debian host, as computed by our approach. https://nvd.nist.gov/vuln/detail/cve-2025-50422 is not reachable as scppin does not call igraph.plot(). State-of-the-art scanners such as Trivy trivy can only detect the presence of the system libcairo (green box), but miss the vulnerable libxml2 library (orange box) vendored by igraph and do not recover its provenance from a Red Hat package (explained in Figure \ref{['fig:auditwheel']}).
  • Figure 2: Workflow of the PyPA auditwheelauditwheel tool for bundling shared libraries into Python wheels. After a package's native extensions are built, auditwheel identifies locates their shared library dependencies in the build environment, copies them into the wheel, and appends 8 hexadecimal characters derived from the original binary's SHA-256 hash to the vendored filename. In this case we depict the packaging process of igraph, also studied in our motivating example (Figure \ref{['fig:run_ex']}).
  • Figure 3: High-level overview of our approach for (a) constructing cross-ecosystem call graphs (XECGs) for Python applications and their dependencies, and (b) utilizing XECGs to compute vulnerability propagation from binary code.
  • Figure 4: Python dependency tree of the scppin package corresponding to the example of Figure \ref{['fig:run_ex']}. The tree is built by recursively examining each package’s .whl metadata.
  • Figure 5: Binary dependency graph of the igraph Python package (version 0.11.9). Pink nodes indicate native extensions, gray nodes indicate vendored libraries, while blue nodes indicate system dependencies. The nodes in the graph are later tagged with their provenance (shown in bold) based on the analysis in Section \ref{['sec:provenance']}. Notably, vendored libraries originate from Red Hat packages, although the Python application is deployed on a Debian host.
  • ...and 3 more figures