Table of Contents
Fetching ...

Leakage-abuse Attack Against Substring-SSE with Partially Known Dataset

Xijie Ba, Qin Liu, Xiaohong Li, Jianting Ning

TL;DR

This paper addresses privacy risks in substring-SSE by presenting the first leakage-abuse attack under partially known datasets. It extends the LEAP framework with a matrix-based correlation approach that leverages a suffix-tree–based substring index to recover plaintext substrings, mapping encrypted tokens to alphabets and substrings via iterative column/row mappings. Experimental evaluation on the Enron corpus shows strong recovery performance, achieving up to $97.87\%$ alphabet and $98.32\%$ string recovery with $50\%$ auxiliary knowledge, and complete recovery at $60\%$ knowledge, while exhibiting robustness to dataset size (degradation $<5\%$ up to $30{,}000$ strings). These results reveal substantial privacy risks in current substring-SSE designs and underscore the urgent need for leakage-resilient constructions and defenses.

Abstract

Substring-searchable symmetric encryption (substring-SSE) has become increasingly critical for privacy-preserving applications in cloud systems. However, existing schemes remain vulnerable to information leakage during search operations, particularly when adversaries possess partial knowledge of the target dataset. Although leakage-abuse attacks have been widely studied for traditional SSE, their applicability to substring-SSE under partially known data assumptions remains unexplored. In this paper, we present the first leakage-abuse attack on substring-SSE under partially-known dataset conditions. We develop a novel matrix-based correlation technique that extends and optimizes the LEAP framework for substring-SSE, enabling efficient recovery of plaintext data from encrypted suffix tree structures. Unlike existing approaches that rely on independent auxiliary datasets, our method directly exploits known data fragments to establish high-confidence mappings between ciphertext tokens and plaintext substrings through iterative matrix transformations. Comprehensive experiments on real-world datasets demonstrate the effectiveness of the attack, with recovery rates reaching 98.32% for substrings given 50% auxiliary knowledge. Even with only 10% prior knowledge, the attack achieves 74.42% substring recovery while maintaining strong scalability across datasets of varying sizes. The result reveals significant privacy risks in current substring-SSE designs and highlights the urgent need for leakage-resilient constructions.

Leakage-abuse Attack Against Substring-SSE with Partially Known Dataset

TL;DR

This paper addresses privacy risks in substring-SSE by presenting the first leakage-abuse attack under partially known datasets. It extends the LEAP framework with a matrix-based correlation approach that leverages a suffix-tree–based substring index to recover plaintext substrings, mapping encrypted tokens to alphabets and substrings via iterative column/row mappings. Experimental evaluation on the Enron corpus shows strong recovery performance, achieving up to alphabet and string recovery with auxiliary knowledge, and complete recovery at knowledge, while exhibiting robustness to dataset size (degradation up to strings). These results reveal substantial privacy risks in current substring-SSE designs and underscore the urgent need for leakage-resilient constructions and defenses.

Abstract

Substring-searchable symmetric encryption (substring-SSE) has become increasingly critical for privacy-preserving applications in cloud systems. However, existing schemes remain vulnerable to information leakage during search operations, particularly when adversaries possess partial knowledge of the target dataset. Although leakage-abuse attacks have been widely studied for traditional SSE, their applicability to substring-SSE under partially known data assumptions remains unexplored. In this paper, we present the first leakage-abuse attack on substring-SSE under partially-known dataset conditions. We develop a novel matrix-based correlation technique that extends and optimizes the LEAP framework for substring-SSE, enabling efficient recovery of plaintext data from encrypted suffix tree structures. Unlike existing approaches that rely on independent auxiliary datasets, our method directly exploits known data fragments to establish high-confidence mappings between ciphertext tokens and plaintext substrings through iterative matrix transformations. Comprehensive experiments on real-world datasets demonstrate the effectiveness of the attack, with recovery rates reaching 98.32% for substrings given 50% auxiliary knowledge. Even with only 10% prior knowledge, the attack achieves 74.42% substring recovery while maintaining strong scalability across datasets of varying sizes. The result reveals significant privacy risks in current substring-SSE designs and highlights the urgent need for leakage-resilient constructions.

Paper Structure

This paper contains 32 sections, 6 equations, 4 figures, 4 tables, 5 algorithms.

Figures (4)

  • Figure 1: The system model of substring searchable symmetric encryption.
  • Figure 2: Suffix tree construction using "hello" and "help" as examples.
  • Figure 3: Recovery rate of character, string and initial path under varied knowledge set.
  • Figure 4: Recovery accuracy of character, string and initial path under varied string scale.