RISC-V Word-Size Modular Instructions for Residue Number Systems
Laurent-Stéphane Didier, Jean-Marc Robert
TL;DR
This work targets fast modular multiplication of large integers by leveraging Residue Number Systems (RNS) on the RISC-V platform, and onderzoeken how dedicated word-size modular arithmetic instructions can accelerate software RNS. It implements three custom RISC-V instructions—$\text{mulmod}$, $\text{addmod}$, and $\text{submod}$—and evaluates their impact across six configurations that combine base-extension methods (Kawamura and Szabo–Tanaka) with modular-reduction strategies (Pseudo-Mersenne, ST, and Instruction), using GEM5-based simulations for 64-bit word sizes and 8–64 RNS channels. Results show substantial speedups from the new instructions, with 2.76×–3.06× improvements over baseline modular reductions (depending on the configuration and processor model) and up to 25.79× gains when using Kawamura’s base-extension on Out-of-Order CPUs; compared with x86, RISC-V with custom instructions can require about 4.5× fewer cycles in In-Order and 8× fewer cycles in Out-of-Order setups. These findings demonstrate the significant potential of ISA Extensions for RNS-based cryptography and high-performance computing, motivating future hardware realizations and vectorized approaches to harness parallelism in RNS computations, including secure large-modulus operations expressed as $M=\prod_{i=1}^n m_i$ and $x=\left|\sum_{i=1}^n x_i \left(\frac{M}{m_i}\right)_{m_i}^{-1} M_i\right|_M$.
Abstract
Residue Number Systems (RNS) are parallel number systems that allow the computation on large numbers. They are used in high performance digital signal processing devices and cryptographic applications. However, the rigidity of instruction set architectures of the market-dominant microprocessors limits the use of such number systems in software applications. This article presents the impact of word-size modular arithmetic specific RISC-V instructions on the software implementation of Residue Number Systems. We evaluate this impact on several RNS modular multiplication sequential algorithms. We observe that the fastest implementation uses the Kawamura et. al. base extension. Simulations of architectures with GEM5 simulator show that RNS modular multiplication with Kawamura's base extension is 2.76 times faster using specific word-size modular arithmetic instructions than pseudo-Mersenne moduli for In Order processors. It is more than 3 times for Out of Order processors. Compared to x86 architectures, RISC-V simulations show that using specific instructions requires 4.5 times less cycles in In Order processors and 8 less in Out of Order ones.
