Table of Contents
Fetching ...

RISC-V Word-Size Modular Instructions for Residue Number Systems

Laurent-Stéphane Didier, Jean-Marc Robert

TL;DR

This work targets fast modular multiplication of large integers by leveraging Residue Number Systems (RNS) on the RISC-V platform, and onderzoeken how dedicated word-size modular arithmetic instructions can accelerate software RNS. It implements three custom RISC-V instructions—$\text{mulmod}$, $\text{addmod}$, and $\text{submod}$—and evaluates their impact across six configurations that combine base-extension methods (Kawamura and Szabo–Tanaka) with modular-reduction strategies (Pseudo-Mersenne, ST, and Instruction), using GEM5-based simulations for 64-bit word sizes and 8–64 RNS channels. Results show substantial speedups from the new instructions, with 2.76×–3.06× improvements over baseline modular reductions (depending on the configuration and processor model) and up to 25.79× gains when using Kawamura’s base-extension on Out-of-Order CPUs; compared with x86, RISC-V with custom instructions can require about 4.5× fewer cycles in In-Order and 8× fewer cycles in Out-of-Order setups. These findings demonstrate the significant potential of ISA Extensions for RNS-based cryptography and high-performance computing, motivating future hardware realizations and vectorized approaches to harness parallelism in RNS computations, including secure large-modulus operations expressed as $M=\prod_{i=1}^n m_i$ and $x=\left|\sum_{i=1}^n x_i \left(\frac{M}{m_i}\right)_{m_i}^{-1} M_i\right|_M$.

Abstract

Residue Number Systems (RNS) are parallel number systems that allow the computation on large numbers. They are used in high performance digital signal processing devices and cryptographic applications. However, the rigidity of instruction set architectures of the market-dominant microprocessors limits the use of such number systems in software applications. This article presents the impact of word-size modular arithmetic specific RISC-V instructions on the software implementation of Residue Number Systems. We evaluate this impact on several RNS modular multiplication sequential algorithms. We observe that the fastest implementation uses the Kawamura et. al. base extension. Simulations of architectures with GEM5 simulator show that RNS modular multiplication with Kawamura's base extension is 2.76 times faster using specific word-size modular arithmetic instructions than pseudo-Mersenne moduli for In Order processors. It is more than 3 times for Out of Order processors. Compared to x86 architectures, RISC-V simulations show that using specific instructions requires 4.5 times less cycles in In Order processors and 8 less in Out of Order ones.

RISC-V Word-Size Modular Instructions for Residue Number Systems

TL;DR

This work targets fast modular multiplication of large integers by leveraging Residue Number Systems (RNS) on the RISC-V platform, and onderzoeken how dedicated word-size modular arithmetic instructions can accelerate software RNS. It implements three custom RISC-V instructions—, , and —and evaluates their impact across six configurations that combine base-extension methods (Kawamura and Szabo–Tanaka) with modular-reduction strategies (Pseudo-Mersenne, ST, and Instruction), using GEM5-based simulations for 64-bit word sizes and 8–64 RNS channels. Results show substantial speedups from the new instructions, with 2.76×–3.06× improvements over baseline modular reductions (depending on the configuration and processor model) and up to 25.79× gains when using Kawamura’s base-extension on Out-of-Order CPUs; compared with x86, RISC-V with custom instructions can require about 4.5× fewer cycles in In-Order and 8× fewer cycles in Out-of-Order setups. These findings demonstrate the significant potential of ISA Extensions for RNS-based cryptography and high-performance computing, motivating future hardware realizations and vectorized approaches to harness parallelism in RNS computations, including secure large-modulus operations expressed as and .

Abstract

Residue Number Systems (RNS) are parallel number systems that allow the computation on large numbers. They are used in high performance digital signal processing devices and cryptographic applications. However, the rigidity of instruction set architectures of the market-dominant microprocessors limits the use of such number systems in software applications. This article presents the impact of word-size modular arithmetic specific RISC-V instructions on the software implementation of Residue Number Systems. We evaluate this impact on several RNS modular multiplication sequential algorithms. We observe that the fastest implementation uses the Kawamura et. al. base extension. Simulations of architectures with GEM5 simulator show that RNS modular multiplication with Kawamura's base extension is 2.76 times faster using specific word-size modular arithmetic instructions than pseudo-Mersenne moduli for In Order processors. It is more than 3 times for Out of Order processors. Compared to x86 architectures, RISC-V simulations show that using specific instructions requires 4.5 times less cycles in In Order processors and 8 less in Out of Order ones.

Paper Structure

This paper contains 24 sections, 4 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: The instructions format for word-size modular arithmetic
  • Figure 2: Intrinsic C function for word-size modular addition
  • Figure 3: RNS modular multiplication timing in clock cycle number, with In Order RISC-V model, mulmod delay: 4, addmod delay: 2
  • Figure 4: RNS modular multiplication Speed-Up, comparison of Kawamura et al.'s versus Szabo and Tanaka methods
  • Figure 5: RNS modular multiplication Speed-Up of Inst. over the C modulo operation and pseudo-Mersenne reduction (PM), In Order processor model
  • ...and 1 more figures