Query Recovery from Easy to Hard: Jigsaw Attack against SSE
Hao Nie, Wei Wang, Peng Xu, Xianglong Zhang, Laurence T. Yang, Kaitai Liang
TL;DR
The paper introduces Jigsaw, a three-stage similar-data attack against SSE that exploits the distributional properties of keyword volume and frequency, along with co-occurrence information, to recover queries. It begins by locating distinctive queries, refines candidates via co-occurrence constraints, and finally recovers the remaining queries iteratively, achieving around $>90\%$ accuracy across multiple datasets and under countermeasures. The method demonstrates robustness to frequency leakage decay and outperforms prior attacks in many scenarios, challenging existing defenses such as padding and obfuscation. These results underscore significant practical risks for SSE schemes and highlight the need for stronger leakage-control mechanisms, including consideration of co-occurrence-aware defenses or stronger access-pattern protections.
Abstract
Searchable symmetric encryption schemes often unintentionally disclose certain sensitive information, such as access, volume, and search patterns. Attackers can exploit such leakages and other available knowledge related to the user's database to recover queries. We find that the effectiveness of query recovery attacks depends on the volume/frequency distribution of keywords. Queries containing keywords with high volumes/frequencies are more susceptible to recovery, even when countermeasures are implemented. Attackers can also effectively leverage these ``special'' queries to recover all others. By exploiting the above finding, we propose a Jigsaw attack that begins by accurately identifying and recovering those distinctive queries. Leveraging the volume, frequency, and co-occurrence information, our attack achieves $90\%$ accuracy in three tested datasets, which is comparable to previous attacks (Oya et al., USENIX' 22 and Damie et al., USENIX' 21). With the same runtime, our attack demonstrates an advantage over the attack proposed by Oya et al (approximately $15\%$ more accuracy when the keyword universe size is 15k). Furthermore, our proposed attack outperforms existing attacks against widely studied countermeasures, achieving roughly $60\%$ and $85\%$ accuracy against the padding and the obfuscation, respectively. In this context, with a large keyword universe ($\geq$3k), it surpasses current state-of-the-art attacks by more than $20\%$.
