Six Levels of Privacy: A Framework for Financial Synthetic Data
Tucker Balch, Vamsi K. Potluru, Deepak Paramanand, Manuela Veloso
TL;DR
The paper addresses privacy risks in financial synthetic data by introducing a six-level privacy framework that categorizes generation methods from simple PII obscuration to uncalibrated simulations. ItCombines attacker taxonomy, regulatory context, and practical testing to guide practitioners in balancing Realism, Privacy, and Utility across use cases. The key contributions include a structured hierarchy, explicit testing guidelines, and illustrative calibration approaches for Level 5, with applicability potentially extending to other industries. The framework enables financial institutions to share, augment, and test synthetic data securely while offering actionable guidance for selecting appropriate privacy levels based on risk and utility considerations.
Abstract
Synthetic Data is increasingly important in financial applications. In addition to the benefits it provides, such as improved financial modeling and better testing procedures, it poses privacy risks as well. Such data may arise from client information, business information, or other proprietary sources that must be protected. Even though the process by which Synthetic Data is generated serves to obscure the original data to some degree, the extent to which privacy is preserved is hard to assess. Accordingly, we introduce a hierarchy of ``levels'' of privacy that are useful for categorizing Synthetic Data generation methods and the progressively improved protections they offer. While the six levels were devised in the context of financial applications, they may also be appropriate for other industries as well. Our paper includes: A brief overview of Financial Synthetic Data, how it can be used, how its value can be assessed, privacy risks, and privacy attacks. We close with details of the ``Six Levels'' that include defenses against those attacks.
