Table of Contents
Fetching ...

On the Independence Assumption in Quasi-Cyclic Code-Based Cryptography

Maxime Bombar, Nicolas Resch, Emiel Wiedijk

TL;DR

The paper investigates the independence assumption in noise modeling for quasi-cyclic code-based cryptography, focusing on HQC. It shows that the noise term $e(X)=t(X)R(X)$ generally does not have independent coordinates, and quantifies the deviation from independence using KL-divergence: $D(tR\|I)=n(\tilde h(\|t\|\omega)-\tilde h(\omega))$, with $\tilde h$ the Bernoulli entropy; this challenges the common modeling assumption. For sums of multiple such products, structured choices of $t_i$ (e.g., identical or arithmetic-progressions) can suppress entropy and cause nontrivial statistical distances from Bernoulli products, providing concrete bounds via entropy analyses and a generalized reverse Pinsker inequality. These results have implications for decryption failure rate analyses and potential worst-case to average-case reductions, highlighting when independence-based modeling is valid and outlining open questions about high-entropy regimes and noise-weight concentration in HQC-like schemes.

Abstract

Cryptography based on the presumed hardness of decoding codes -- i.e., code-based cryptography -- has recently seen increased interest due to its plausible security against quantum attackers. Notably, of the four proposals for the NIST post-quantum standardization process that were advanced to their fourth round for further review, two were code-based. The most efficient proposals -- including HQC and BIKE, the NIST submissions alluded to above -- in fact rely on the presumed hardness of decoding structured codes. Of particular relevance to our work, HQC is based on quasi-cyclic codes, which are codes generated by matrices consisting of two cyclic blocks. In particular, the security analysis of HQC requires a precise understanding of the Decryption Failure Rate (DFR), whose analysis relies on the following heuristic: given random ``sparse'' vectors $e_1,e_2$ (say, each coordinate is i.i.d. Bernoulli) multiplied by fixed ``sparse'' quasi-cyclic matrices $A_1,A_2$, the weight of resulting vector $e_1A_1+e_2A_2$ is very concentrated around its expectation. In the documentation, the authors model the distribution of $e_1A_1+e_2A_2$ as a vector with independent coordinates (and correct marginal distribution). However, we uncover cases where this modeling fails. While this does not invalidate the (empirically verified) heuristic that the weight of $e_1A_1+e_2A_2$ is concentrated, it does suggest that the behavior of the noise is a bit more subtle than previously predicted. Lastly, we also discuss implications of our result for potential worst-case to average-case reductions for quasi-cyclic codes.

On the Independence Assumption in Quasi-Cyclic Code-Based Cryptography

TL;DR

The paper investigates the independence assumption in noise modeling for quasi-cyclic code-based cryptography, focusing on HQC. It shows that the noise term generally does not have independent coordinates, and quantifies the deviation from independence using KL-divergence: , with the Bernoulli entropy; this challenges the common modeling assumption. For sums of multiple such products, structured choices of (e.g., identical or arithmetic-progressions) can suppress entropy and cause nontrivial statistical distances from Bernoulli products, providing concrete bounds via entropy analyses and a generalized reverse Pinsker inequality. These results have implications for decryption failure rate analyses and potential worst-case to average-case reductions, highlighting when independence-based modeling is valid and outlining open questions about high-entropy regimes and noise-weight concentration in HQC-like schemes.

Abstract

Cryptography based on the presumed hardness of decoding codes -- i.e., code-based cryptography -- has recently seen increased interest due to its plausible security against quantum attackers. Notably, of the four proposals for the NIST post-quantum standardization process that were advanced to their fourth round for further review, two were code-based. The most efficient proposals -- including HQC and BIKE, the NIST submissions alluded to above -- in fact rely on the presumed hardness of decoding structured codes. Of particular relevance to our work, HQC is based on quasi-cyclic codes, which are codes generated by matrices consisting of two cyclic blocks. In particular, the security analysis of HQC requires a precise understanding of the Decryption Failure Rate (DFR), whose analysis relies on the following heuristic: given random ``sparse'' vectors (say, each coordinate is i.i.d. Bernoulli) multiplied by fixed ``sparse'' quasi-cyclic matrices , the weight of resulting vector is very concentrated around its expectation. In the documentation, the authors model the distribution of as a vector with independent coordinates (and correct marginal distribution). However, we uncover cases where this modeling fails. While this does not invalidate the (empirically verified) heuristic that the weight of is concentrated, it does suggest that the behavior of the noise is a bit more subtle than previously predicted. Lastly, we also discuss implications of our result for potential worst-case to average-case reductions for quasi-cyclic codes.
Paper Structure (16 sections, 15 theorems, 79 equations, 1 table)

This paper contains 16 sections, 15 theorems, 79 equations, 1 table.

Key Result

Theorem 1.1

Let $t(X) \in \mathbb F_2[X]/(X^n-1)$ be a fixed polynomial with $\tau$ nonzero coefficients, and let $R(X) = \sum_{i=0}^{n-1}R_iX^i$ be such that each $R_i\leftarrow \mathop{\mathrm{Ber}}\nolimits(\omega)$ independently. Let $I(X) = \sum_{i=0}^{n-1}I_iX^i$ where each $I_i \leftarrow \mathop{\mathrm

Theorems & Definitions (25)

  • Theorem 1.1: Main Theorem (Informal); see \ref{['thm:noiseStatDistLower']}
  • Lemma 2.1: Piling-up lemma
  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • Lemma 3.3: Expectation of $|t R|$
  • proof
  • Lemma 3.4
  • proof
  • ...and 15 more