Table of Contents
Fetching ...

Six Levels of Privacy: A Framework for Financial Synthetic Data

Tucker Balch, Vamsi K. Potluru, Deepak Paramanand, Manuela Veloso

TL;DR

The paper addresses privacy risks in financial synthetic data by introducing a six-level privacy framework that categorizes generation methods from simple PII obscuration to uncalibrated simulations. ItCombines attacker taxonomy, regulatory context, and practical testing to guide practitioners in balancing Realism, Privacy, and Utility across use cases. The key contributions include a structured hierarchy, explicit testing guidelines, and illustrative calibration approaches for Level 5, with applicability potentially extending to other industries. The framework enables financial institutions to share, augment, and test synthetic data securely while offering actionable guidance for selecting appropriate privacy levels based on risk and utility considerations.

Abstract

Synthetic Data is increasingly important in financial applications. In addition to the benefits it provides, such as improved financial modeling and better testing procedures, it poses privacy risks as well. Such data may arise from client information, business information, or other proprietary sources that must be protected. Even though the process by which Synthetic Data is generated serves to obscure the original data to some degree, the extent to which privacy is preserved is hard to assess. Accordingly, we introduce a hierarchy of ``levels'' of privacy that are useful for categorizing Synthetic Data generation methods and the progressively improved protections they offer. While the six levels were devised in the context of financial applications, they may also be appropriate for other industries as well. Our paper includes: A brief overview of Financial Synthetic Data, how it can be used, how its value can be assessed, privacy risks, and privacy attacks. We close with details of the ``Six Levels'' that include defenses against those attacks.

Six Levels of Privacy: A Framework for Financial Synthetic Data

TL;DR

The paper addresses privacy risks in financial synthetic data by introducing a six-level privacy framework that categorizes generation methods from simple PII obscuration to uncalibrated simulations. ItCombines attacker taxonomy, regulatory context, and practical testing to guide practitioners in balancing Realism, Privacy, and Utility across use cases. The key contributions include a structured hierarchy, explicit testing guidelines, and illustrative calibration approaches for Level 5, with applicability potentially extending to other industries. The framework enables financial institutions to share, augment, and test synthetic data securely while offering actionable guidance for selecting appropriate privacy levels based on risk and utility considerations.

Abstract

Synthetic Data is increasingly important in financial applications. In addition to the benefits it provides, such as improved financial modeling and better testing procedures, it poses privacy risks as well. Such data may arise from client information, business information, or other proprietary sources that must be protected. Even though the process by which Synthetic Data is generated serves to obscure the original data to some degree, the extent to which privacy is preserved is hard to assess. Accordingly, we introduce a hierarchy of ``levels'' of privacy that are useful for categorizing Synthetic Data generation methods and the progressively improved protections they offer. While the six levels were devised in the context of financial applications, they may also be appropriate for other industries as well. Our paper includes: A brief overview of Financial Synthetic Data, how it can be used, how its value can be assessed, privacy risks, and privacy attacks. We close with details of the ``Six Levels'' that include defenses against those attacks.
Paper Structure (13 sections, 6 figures, 1 table)

This paper contains 13 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Privacy Level 1: Obscure PII
  • Figure 2: Privacy Level 2: Obscure PII + noise
  • Figure 3: Privacy Level 3: Generative modeling. The question mark suggests the possibility of reverse-engineering the data.
  • Figure 4: Privacy Level 4: Generative modeling + testing
  • Figure 5: Privacy Level 5: Calibrated simulation
  • ...and 1 more figures