Table of Contents
Fetching ...

gECC: A GPU-based high-throughput framework for Elliptic Curve Cryptography

Qian Xiong, Weiliang Ma, Xuanhua Shi, Yongluan Zhou, Hai Jin, Kaiyi Huang, Haozhou Wang, Zhengru Wang

TL;DR

The paper tackles the bottleneck of ECC performance in throughput-sensitive applications by proposing gECC, a GPU-optimized framework that batches EC operations using Montgomery's trick and employs GAS for batch modular inversion. It combines data-locality aware kernel fusion and multi-level cache management to minimize memory overhead, and it introduces SM2-specific modular reduction optimizations to reduce IMAD instructions. Empirical results on Nvidia A100 show large gains: up to 5.56x for ECDSA verification and 4.94x for ECDH over state-of-the-art GPU systems, with significant improvements in batch PMUL and modular multiplication, and a 1.56x throughput boost in a real blockchain workload. The work demonstrates practical impact for high-throughput crypto services in blockchain, verifiable databases, and secure cloud services, and provides open-source access to the gECC framework.

Abstract

Elliptic Curve Cryptography (ECC) is an encryption method that provides security comparable to traditional techniques like Rivest-Shamir-Adleman (RSA) but with lower computational complexity and smaller key sizes, making it a competitive option for applications such as blockchain, secure multi-party computation, and database security. However, the throughput of ECC is still hindered by the significant performance overhead associated with elliptic curve (EC) operations. This paper presents gECC, a versatile framework for ECC optimized for GPU architectures, specifically engineered to achieve high-throughput performance in EC operations. gECC incorporates batch-based execution of EC operations and microarchitecture-level optimization of modular arithmetic. It employs Montgomery's trick to enable batch EC computation and incorporates novel computation parallelization and memory management techniques to maximize the computation parallelism and minimize the access overhead of GPU global memory. Also, we analyze the primary bottleneck in modular multiplication by investigating how the user codes of modular multiplication are compiled into hardware instructions and what these instructions' issuance rates are. We identify that the efficiency of modular multiplication is highly dependent on the number of Integer Multiply-Add (IMAD) instructions. To eliminate this bottleneck, we propose techniques to minimize the number of IMAD instructions by leveraging predicate registers to pass the carry information and using addition and subtraction instructions (IADD3) to replace IMAD instructions. Our results show that, for ECDSA and ECDH, gECC can achieve performance improvements of 5.56x and 4.94x, respectively, compared to the state-of-the-art GPU-based system. In a real-world blockchain application, we can achieve performance improvements of 1.56x, compared to the state-of-the-art CPU-based system.

gECC: A GPU-based high-throughput framework for Elliptic Curve Cryptography

TL;DR

The paper tackles the bottleneck of ECC performance in throughput-sensitive applications by proposing gECC, a GPU-optimized framework that batches EC operations using Montgomery's trick and employs GAS for batch modular inversion. It combines data-locality aware kernel fusion and multi-level cache management to minimize memory overhead, and it introduces SM2-specific modular reduction optimizations to reduce IMAD instructions. Empirical results on Nvidia A100 show large gains: up to 5.56x for ECDSA verification and 4.94x for ECDH over state-of-the-art GPU systems, with significant improvements in batch PMUL and modular multiplication, and a 1.56x throughput boost in a real blockchain workload. The work demonstrates practical impact for high-throughput crypto services in blockchain, verifiable databases, and secure cloud services, and provides open-source access to the gECC framework.

Abstract

Elliptic Curve Cryptography (ECC) is an encryption method that provides security comparable to traditional techniques like Rivest-Shamir-Adleman (RSA) but with lower computational complexity and smaller key sizes, making it a competitive option for applications such as blockchain, secure multi-party computation, and database security. However, the throughput of ECC is still hindered by the significant performance overhead associated with elliptic curve (EC) operations. This paper presents gECC, a versatile framework for ECC optimized for GPU architectures, specifically engineered to achieve high-throughput performance in EC operations. gECC incorporates batch-based execution of EC operations and microarchitecture-level optimization of modular arithmetic. It employs Montgomery's trick to enable batch EC computation and incorporates novel computation parallelization and memory management techniques to maximize the computation parallelism and minimize the access overhead of GPU global memory. Also, we analyze the primary bottleneck in modular multiplication by investigating how the user codes of modular multiplication are compiled into hardware instructions and what these instructions' issuance rates are. We identify that the efficiency of modular multiplication is highly dependent on the number of Integer Multiply-Add (IMAD) instructions. To eliminate this bottleneck, we propose techniques to minimize the number of IMAD instructions by leveraging predicate registers to pass the carry information and using addition and subtraction instructions (IADD3) to replace IMAD instructions. Our results show that, for ECDSA and ECDH, gECC can achieve performance improvements of 5.56x and 4.94x, respectively, compared to the state-of-the-art GPU-based system. In a real-world blockchain application, we can achieve performance improvements of 1.56x, compared to the state-of-the-art CPU-based system.
Paper Structure (26 sections, 5 equations, 14 figures, 5 tables, 2 algorithms)

This paper contains 26 sections, 5 equations, 14 figures, 5 tables, 2 algorithms.

Figures (14)

  • Figure 1: Overview of gECC Framework.
  • Figure 2: The example of Montgomery's trick.
  • Figure 3: Different mechanism of batch modular inversion on GPU and corresponding runtime execution.
  • Figure 4: Multi-level cache management for batch PADD.
  • Figure 5: The data layout of n EC point
  • ...and 9 more figures