Table of Contents
Fetching ...

ParsEval: Evaluation of Parsing Behavior using Real-world Out-in-the-wild X.509 Certificates

Stefan Tatschner, Sebastian N. Peters, Michael P. Heinl, Tobias Specht, Thomas Newe

TL;DR

The paper examines user-visible and security-related differences among X.509 certificate parsers by running real-world certificates through six production-ready libraries and comparing results. Using the 11.6 TiB Censys dataset, it applies a large-scale, differential parsing approach and an explicit error taxonomy to quantify performance, error rates, and failure modes. Key findings reveal pronounced disparities, with wolfSSL and Mbed TLS exhibiting about 13% parsing failures, OpenSSL showing near-zero failures, and diverse error patterns across implementations, including notable regressions in wolfSSL. The work highlights the need for standardized, automated testing to improve interoperability and security in the heterogeneous X.509 ecosystem, and provides a foundation for future investigations into the security implications of parser differences.

Abstract

X.509 certificates play a crucial role in establishing secure communication over the internet by enabling authentication and data integrity. Equipped with a rich feature set, the X.509 standard is defined by multiple, comprehensive ISO/IEC documents. Due to its internet-wide usage, there are different implementations in multiple programming languages leading to a large and fragmented ecosystem. This work addresses the research question "Are there user-visible and security-related differences between X.509 certificate parsers?". Relevant libraries offering APIs for parsing X.509 certificates were investigated and an appropriate test suite was developed. From 34 libraries 6 were chosen for further analysis. The X.509 parsing modules of the chosen libraries were called with 186,576,846 different certificates from a real-world dataset and the observed error codes were investigated. This study reveals an anomaly in wolfSSL's X.509 parsing module and that there are fundamental differences in the ecosystem. While related studies nowadays mostly focus on fuzzing techniques resulting in artificial certificates, this study confirms that available X.509 parsing modules differ largely and yield different results, even for real-world out-in-the-wild certificates.

ParsEval: Evaluation of Parsing Behavior using Real-world Out-in-the-wild X.509 Certificates

TL;DR

The paper examines user-visible and security-related differences among X.509 certificate parsers by running real-world certificates through six production-ready libraries and comparing results. Using the 11.6 TiB Censys dataset, it applies a large-scale, differential parsing approach and an explicit error taxonomy to quantify performance, error rates, and failure modes. Key findings reveal pronounced disparities, with wolfSSL and Mbed TLS exhibiting about 13% parsing failures, OpenSSL showing near-zero failures, and diverse error patterns across implementations, including notable regressions in wolfSSL. The work highlights the need for standardized, automated testing to improve interoperability and security in the heterogeneous X.509 ecosystem, and provides a foundation for future investigations into the security implications of parser differences.

Abstract

X.509 certificates play a crucial role in establishing secure communication over the internet by enabling authentication and data integrity. Equipped with a rich feature set, the X.509 standard is defined by multiple, comprehensive ISO/IEC documents. Due to its internet-wide usage, there are different implementations in multiple programming languages leading to a large and fragmented ecosystem. This work addresses the research question "Are there user-visible and security-related differences between X.509 certificate parsers?". Relevant libraries offering APIs for parsing X.509 certificates were investigated and an appropriate test suite was developed. From 34 libraries 6 were chosen for further analysis. The X.509 parsing modules of the chosen libraries were called with 186,576,846 different certificates from a real-world dataset and the observed error codes were investigated. This study reveals an anomaly in wolfSSL's X.509 parsing module and that there are fundamental differences in the ecosystem. While related studies nowadays mostly focus on fuzzing techniques resulting in artificial certificates, this study confirms that available X.509 parsing modules differ largely and yield different results, even for real-world out-in-the-wild certificates.
Paper Structure (26 sections, 1 equation, 3 figures, 6 tables)

This paper contains 26 sections, 1 equation, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Schematic figure of the test setup.
  • Figure 2: A violin plot of the normalized parsing duration for one certificate file from the chosen dataset. One batch file contains between $\approx 70,000$ and $\approx 150,000$ certificates. 2,000 batch files were processed for this plot. The parsing duration of one certificate was calculated by dividing the parsing duration of a batch file by the number of contained certificates.
  • Figure 3: Normalized distribution of error categories among the tested libraries.