CRYPTONITE: Scalable Accelerator Design for Cryptographic Primitives and Algorithms
Karthikeya Sharma Maheswaran, Camille Bossut, Andy Wanna, Qirun Zhang, Cong Hao
TL;DR
Cryptonite addresses the inefficiency of directly synthesizing straight-line cryptographic C code by automatically generating scalable, correct-by-design hardware accelerators. It uses a two-stage process: an E-graph Loop Synthesizer to reveal loopable, array-based representations from straight-line code, and a Hardware Implementation Explorer with a QoR estimator to perform Pareto-optimized design-space exploration. The approach yields numerous Pareto-optimal designs and, in single- and multi-kernel evaluations on Fiat Cryptography primitives, achieves substantial reductions in latency and resource usage versus naive or baseline methods. This automation enables scalable cryptographic accelerators in constrained environments, balancing performance with hardware resources across multiple curves and primitives.
Abstract
Cryptographic primitives, consisting of repetitive operations with different inputs, are typically implemented using straight-line C code due to traditional execution on CPUs. Computing these primitives is necessary for secure communication; thus, dedicated hardware accelerators are required in resource and latency-constrained environments. High-Level Synthesis (HLS) generates hardware from high-level implementations in languages like C, enabling the rapid prototyping and evaluation of designs, leading to its prominent use in developing dedicated hardware accelerators. However, directly synthesizing the straight-line C implementations of cryptographic primitives can lead to large hardware designs with excessive resource usage or suboptimal performance. We introduce Cryptonite, a tool that automatically generates efficient, synthesizable, and correct-by-design hardware accelerators for cryptographic primitives directly from straight-line C code. Cryptonite first identifies high-level hardware constructs through verified rewriting, emphasizing resource reuse. The second stage automatically explores latency-oriented implementations of the compact design. This enables the flexible scaling of a particular accelerator to meet the hardware requirements. We demonstrate Cryptonite's effectiveness using implementations from the Fiat Cryptography project, a library of verified and auto-generated cryptographic primitives for elliptic-curve cryptography. Our results show that Cryptonite achieves scalable designs with up to 88.88\% reduced resource usage and a 54.31\% improvement in latency compared to naively synthesized designs.
