Faster Post-Quantum TLS 1.3 Based on ML-KEM: Implementation and Assessment
Jieyu Zheng, Haoliang Zhu, Yifan Dong, Zhenyu Song, Zhenhao Zhang, Yafang Yang, Yunlei Zhao
TL;DR
This work addresses the handshake performance bottleneck of post-quantum TLS 1.3 by optimizing ML-KEM with AVX-512 and introducing an 8-way batch key-generation method, then integrating the optimized KEM into TLS 1.3 via the OQS/liboqs framework. It demonstrates up to a 1.64× speedup for ML-KEM over AVX2, and 3.5×–4.9× improvements for batch key generation, while evaluating IND-1-CCA KEM constructions (T_CH and T_H/T_RH) and showing improved TLS handshake throughput in PQ-only and, to a lesser extent, hybrid configurations. The paper provides a practical path to faster PQ-TLS handshakes and offers guidance on selecting IND-1-CCA KEM constructions for TLS 1.3, supported by empirical TLS benchmarks. The work also contributes open-source AVX-512 ML-KEM code integrated with TLS 1.3, enabling researchers and practitioners to adopt and extend post-quantum security in real-world secure communications.
Abstract
TLS is extensively utilized for secure data transmission over networks. However, with the advent of quantum computers, the security of TLS based on traditional public-key cryptography is under threat. To counter quantum threats, it is imperative to integrate post-quantum algorithms into TLS. Most PQ-TLS research focuses on integration and evaluation, but few studies address the improvement of PQ-TLS performance by optimizing PQC implementation. For the TLS protocol, handshake performance is crucial, and for post-quantum TLS (PQ-TLS) the performance of post-quantum key encapsulation mechanisms (KEMs) directly impacts handshake performance. In this work, we explore the impact of post-quantum KEMs on PQ-TLS performance. We explore how to improve ML-KEM performance using the latest Intel's Advanced Vector Extensions instruction set AVX-512. We detail a spectrum of techniques devised to parallelize polynomial multiplication, modular reduction, and other computationally intensive modules within ML-KEM. Our optimized ML-KEM implementation achieves up to 1.64x speedup compared to the latest AVX2 implementation. Furthermore, we introduce a novel batch key generation method for ML-KEM that can seamlessly integrate into the TLS protocols. The batch method accelerates the key generation procedure by 3.5x to 4.9x. We integrate the optimized AVX-512 implementation of ML-KEM into TLS 1.3, and assess handshake performance under both PQ-only and hybrid modes. The assessment demonstrates that our faster ML-KEM implementation results in a higher number of TLS 1.3 handshakes per second under both modes. Additionally, we revisit two IND-1-CCA KEM constructions discussed in Eurocrypt22 and Asiacrypt23. Besides, we implement them based on ML-KEM and integrate the one of better performance into TLS 1.3 with benchmarks.
