Differentially Private Relational Learning with Entity-level Privacy Guarantees
Yinan Huang, Haoteng Yin, Eli Chien, Rongzhe Wei, Pan Li
TL;DR
The paper tackles entity-level differential privacy for relational learning by identifying two core difficulties: high gradient sensitivity when nodes participate in many relations and the complex, coupled sampling in relational training. It develops an adaptive gradient clipping strategy that scales thresholds based on node occurrence and defines a tractable coupled-sampling amplification bound for a cardinality-dependent sampling subclass. Integrating these ideas, the authors present a DP-SGD variant tailored for relational data and demonstrate its practical utility by privately fine-tuning text encoders on text-attributed graphs, achieving favorable privacy-utility trade-offs. The work advances privacy guarantees for graph-structured learning and provides concrete methodology and empirical validation for real-world relational applications.
Abstract
Learning with relational and network-structured data is increasingly vital in sensitive domains where protecting the privacy of individual entities is paramount. Differential Privacy (DP) offers a principled approach for quantifying privacy risks, with DP-SGD emerging as a standard mechanism for private model training. However, directly applying DP-SGD to relational learning is challenging due to two key factors: (i) entities often participate in multiple relations, resulting in high and difficult-to-control sensitivity; and (ii) relational learning typically involves multi-stage, potentially coupled (interdependent) sampling procedures that make standard privacy amplification analyses inapplicable. This work presents a principled framework for relational learning with formal entity-level DP guarantees. We provide a rigorous sensitivity analysis and introduce an adaptive gradient clipping scheme that modulates clipping thresholds based on entity occurrence frequency. We also extend the privacy amplification results to a tractable subclass of coupled sampling, where the dependence arises only through sample sizes. These contributions lead to a tailored DP-SGD variant for relational data with provable privacy guarantees. Experiments on fine-tuning text encoders over text-attributed network-structured relational data demonstrate the strong utility-privacy trade-offs of our approach. Our code is available at https://github.com/Graph-COM/Node_DP.
