Table of Contents
Fetching ...

Evaluating utility in synthetic banking microdata applications

Hugo E. Caceres, Ben Moews

TL;DR

A framework that considers the utility and privacy requirements of regulators, and applies this to financial usage indices, term deposit yield curves, and credit card transition matrices finds that applications less susceptible to post-processing information loss are particularly suited for this approach, and that marginal-based inference mechanisms to outperform generative adversarial network models for these applications.

Abstract

Financial regulators such as central banks collect vast amounts of data, but access to the resulting fine-grained banking microdata is severely restricted by banking secrecy laws. Recent developments have resulted in mechanisms that generate faithful synthetic data, but current evaluation frameworks lack a focus on the specific challenges of banking institutions and microdata. We develop a framework that considers the utility and privacy requirements of regulators, and apply this to financial usage indices, term deposit yield curves, and credit card transition matrices. Using the Central Bank of Paraguay's data, we provide the first implementation of synthetic banking microdata using a central bank's collected information, with the resulting synthetic datasets for all three domain applications being publicly available and featuring information not yet released in statistical disclosure. We find that applications less susceptible to post-processing information loss, which are based on frequency tables, are particularly suited for this approach, and that marginal-based inference mechanisms to outperform generative adversarial network models for these applications. Our results demonstrate that synthetic data generation is a promising privacy-enhancing technology for financial regulators seeking to complement their statistical disclosure, while highlighting the crucial role of evaluating such endeavors in terms of utility and privacy requirements.

Evaluating utility in synthetic banking microdata applications

TL;DR

A framework that considers the utility and privacy requirements of regulators, and applies this to financial usage indices, term deposit yield curves, and credit card transition matrices finds that applications less susceptible to post-processing information loss are particularly suited for this approach, and that marginal-based inference mechanisms to outperform generative adversarial network models for these applications.

Abstract

Financial regulators such as central banks collect vast amounts of data, but access to the resulting fine-grained banking microdata is severely restricted by banking secrecy laws. Recent developments have resulted in mechanisms that generate faithful synthetic data, but current evaluation frameworks lack a focus on the specific challenges of banking institutions and microdata. We develop a framework that considers the utility and privacy requirements of regulators, and apply this to financial usage indices, term deposit yield curves, and credit card transition matrices. Using the Central Bank of Paraguay's data, we provide the first implementation of synthetic banking microdata using a central bank's collected information, with the resulting synthetic datasets for all three domain applications being publicly available and featuring information not yet released in statistical disclosure. We find that applications less susceptible to post-processing information loss, which are based on frequency tables, are particularly suited for this approach, and that marginal-based inference mechanisms to outperform generative adversarial network models for these applications. Our results demonstrate that synthetic data generation is a promising privacy-enhancing technology for financial regulators seeking to complement their statistical disclosure, while highlighting the crucial role of evaluating such endeavors in terms of utility and privacy requirements.

Paper Structure

This paper contains 24 sections, 13 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Paraguay's banked population by usage indicators $B_{s,l}^u$ with levels low (1), medium (2), and high (3). Usage indicators are calculated with (original) banking and synthetic microdata from 2023.
  • Figure 2: Yield curves produced with synthetic term deposit microdata using MST, PATECTGAN, and PAC, for the Central Bank of Paraguay's (CBP) and data-driven (DD) pre-processing. Each subplot includes the scatter points of the calculated weighted average interest rate for term bins and a LOWESS curve.
  • Figure 3: Nelson-Siegel-Svensson curves calculated with synthetic data for each mechanism, with the CBP's pre-processing strategy and 2023 term deposit data.
  • Figure 4: Delinquency rate of credit cards across age and gender groups, based on 2023 (original) banking and synthetic microdata using CBP's PS. Missing values correspond to levels without generated data.