Zk-SNARK for String Match
Taoran Li, Taobo Liao
TL;DR
This work addresses the problem of detecting sensitive data leaks on public platforms while preserving privacy by using zk-SNARKs to verify private strings against public data. It combines a sliding window Rabin-Karp hashing strategy with Rabin Fingerprint to enable efficient, privacy-preserving string matching inside a zk-SNARK circuit, implemented via the gnark library. A Merkle Tree-based verification layer provides succinct proofs of substring membership and data integrity, complemented by circuit-friendly hashing such as MiMC. Experimental results show strong privacy guarantees with scalable performance, demonstrating the practicality of zero-knowledge proofs for secure data verification and large-scale string matching. The paper also outlines future work on non-membership proofs and a polynomial approach to further accelerate verification in privacy-preserving contexts.
Abstract
We present a secure and efficient string-matching platform leveraging zk-SNARKs (Zero-Knowledge Succinct Non-Interactive Arguments of Knowledge) to address the challenge of detecting sensitive information leakage while preserving data privacy. Our solution enables organizations to verify whether private strings appear on public platforms without disclosing the strings themselves. To achieve computational efficiency, we integrate a sliding window technique with the Rabin-Karp algorithm and Rabin Fingerprint, enabling hash-based rolling comparisons to detect string matches. This approach significantly reduces time complexity compared to traditional character-by-character comparisons. We implement the proposed system using gnark, a high-performance zk-SNARK library, which generates succinct and verifiable proofs for privacy-preserving string matching. Experimental results demonstrate that our solution achieves strong privacy guarantees while maintaining computational efficiency and scalability. This work highlights the practical applications of zero-knowledge proofs in secure data verification and contributes a scalable method for privacy-preserving string matching.
