On the Independence Assumption in Quasi-Cyclic Code-Based Cryptography
Maxime Bombar, Nicolas Resch, Emiel Wiedijk
TL;DR
The paper investigates the independence assumption in noise modeling for quasi-cyclic code-based cryptography, focusing on HQC. It shows that the noise term $e(X)=t(X)R(X)$ generally does not have independent coordinates, and quantifies the deviation from independence using KL-divergence: $D(tR\|I)=n(\tilde h(\|t\|\omega)-\tilde h(\omega))$, with $\tilde h$ the Bernoulli entropy; this challenges the common modeling assumption. For sums of multiple such products, structured choices of $t_i$ (e.g., identical or arithmetic-progressions) can suppress entropy and cause nontrivial statistical distances from Bernoulli products, providing concrete bounds via entropy analyses and a generalized reverse Pinsker inequality. These results have implications for decryption failure rate analyses and potential worst-case to average-case reductions, highlighting when independence-based modeling is valid and outlining open questions about high-entropy regimes and noise-weight concentration in HQC-like schemes.
Abstract
Cryptography based on the presumed hardness of decoding codes -- i.e., code-based cryptography -- has recently seen increased interest due to its plausible security against quantum attackers. Notably, of the four proposals for the NIST post-quantum standardization process that were advanced to their fourth round for further review, two were code-based. The most efficient proposals -- including HQC and BIKE, the NIST submissions alluded to above -- in fact rely on the presumed hardness of decoding structured codes. Of particular relevance to our work, HQC is based on quasi-cyclic codes, which are codes generated by matrices consisting of two cyclic blocks. In particular, the security analysis of HQC requires a precise understanding of the Decryption Failure Rate (DFR), whose analysis relies on the following heuristic: given random ``sparse'' vectors $e_1,e_2$ (say, each coordinate is i.i.d. Bernoulli) multiplied by fixed ``sparse'' quasi-cyclic matrices $A_1,A_2$, the weight of resulting vector $e_1A_1+e_2A_2$ is very concentrated around its expectation. In the documentation, the authors model the distribution of $e_1A_1+e_2A_2$ as a vector with independent coordinates (and correct marginal distribution). However, we uncover cases where this modeling fails. While this does not invalidate the (empirically verified) heuristic that the weight of $e_1A_1+e_2A_2$ is concentrated, it does suggest that the behavior of the noise is a bit more subtle than previously predicted. Lastly, we also discuss implications of our result for potential worst-case to average-case reductions for quasi-cyclic codes.
