Privacy-Preserved Neural Graph Databases
Qi Hu, Haoran Li, Jiaxin Bai, Zihao Wang, Yangqiu Song
TL;DR
This paper addresses privacy leakage risks in privacy-sensitive neural graph databases (NGDBs) arising from complex query answering. It proposes Privacy-Preserved NGDB (P-NGDB), which uses adversarial training to obfuscate private information while preserving public query accuracy, formalizes privacy definitions, and builds a three-dataset benchmark (FB15K-N, DB15K-N, YAGO15K-N) to evaluate both CQA performance and privacy protection. The approach combines query encoding with a dual-objective learning framework $L = L_u + \beta L_p$, enabling controllable privacy protection via the parameter $\beta$. Empirical results show that P-NGDB effectively reduces privacy leakage with only a modest drop in public retrieval quality, outperforming simple noise-based baselines and offering a practical path toward safer RAG over domain-private graphs.
Abstract
In the era of large language models (LLMs), efficient and accurate data retrieval has become increasingly crucial for the use of domain-specific or private data in the retrieval augmented generation (RAG). Neural graph databases (NGDBs) have emerged as a powerful paradigm that combines the strengths of graph databases (GDBs) and neural networks to enable efficient storage, retrieval, and analysis of graph-structured data which can be adaptively trained with LLMs. The usage of neural embedding storage and Complex neural logical Query Answering (CQA) provides NGDBs with generalization ability. When the graph is incomplete, by extracting latent patterns and representations, neural graph databases can fill gaps in the graph structure, revealing hidden relationships and enabling accurate query answering. Nevertheless, this capability comes with inherent trade-offs, as it introduces additional privacy risks to the domain-specific or private databases. Malicious attackers can infer more sensitive information in the database using well-designed queries such as from the answer sets of where Turing Award winners born before 1950 and after 1940 lived, the living places of Turing Award winner Hinton are probably exposed, although the living places may have been deleted in the training stage due to the privacy concerns. In this work, we propose a privacy-preserved neural graph database (P-NGDB) framework to alleviate the risks of privacy leakage in NGDBs. We introduce adversarial training techniques in the training stage to enforce the NGDBs to generate indistinguishable answers when queried with private information, enhancing the difficulty of inferring sensitive information through combinations of multiple innocuous queries.
