Synthetic Data Applications in Finance

Vamsi K. Potluru; Daniel Borrajo; Andrea Coletta; Niccolò Dalmasso; Yousef El-Laham; Elizabeth Fons; Mohsen Ghassemi; Sriram Gopalakrishnan; Vikesh Gosai; Eleonora Kreačić; Ganapathy Mani; Saheed Obitayo; Deepak Paramanand; Natraj Raman; Mikhail Solonin; Srijan Sood; Svitlana Vyetrenko; Haibei Zhu; Manuela Veloso; Tucker Balch

Synthetic Data Applications in Finance

Vamsi K. Potluru, Daniel Borrajo, Andrea Coletta, Niccolò Dalmasso, Yousef El-Laham, Elizabeth Fons, Mohsen Ghassemi, Sriram Gopalakrishnan, Vikesh Gosai, Eleonora Kreačić, Ganapathy Mani, Saheed Obitayo, Deepak Paramanand, Natraj Raman, Mikhail Solonin, Srijan Sood, Svitlana Vyetrenko, Haibei Zhu, Manuela Veloso, Tucker Balch

TL;DR

This paper surveys synthetic data applications in finance across multiple data modalities, including tabular, time-series, event-series, and unstructured formats, and emphasizes regulatory and privacy considerations. It surveys generation techniques, from model-based simulators like ABIDES to neural generators such as CTGAN and TimeGAN, and proposes a privacy-level framework to guide safe deployment. The work highlights metrics for fidelity, utility, and privacy, and discusses data-liberation, augmentation, and counterfactual testing as core use-cases, illustrated by fraud detection, marketing journeys, and market-simulation case studies. It concludes with open challenges and directions, underscoring the potential of synthetic data to enable robust testing, safer data sharing, and improved decision-making in finance while acknowledging regulatory, ethical, and practical hurdles.

Abstract

Synthetic data has made tremendous strides in various commercial settings including finance, healthcare, and virtual reality. We present a broad overview of prototypical applications of synthetic data in the financial sector and in particular provide richer details for a few select ones. These cover a wide variety of data modalities including tabular, time-series, event-series, and unstructured arising from both markets and retail financial applications. Since finance is a highly regulated industry, synthetic data is a potential approach for dealing with issues related to privacy, fairness, and explainability. Various metrics are utilized in evaluating the quality and effectiveness of our approaches in these applications. We conclude with open directions in synthetic data in the context of the financial domain.

Synthetic Data Applications in Finance

TL;DR

Abstract

Paper Structure (75 sections, 1 equation, 19 figures, 13 tables)

This paper contains 75 sections, 1 equation, 19 figures, 13 tables.

Introduction
Data Liberation:
Augmentation:
Counterfactual Scenarios and Testing:
Background and Related Work
Generation techniques
Model-Based Simulation methods
Using Markov models:
Agent-based models:
Metrics
Fidelity:
Utility:
Privacy:
Synthetic Data Generation with Python Libraries
Privacy
...and 60 more sections

Figures (19)

Figure 1: (Left) A Markov model in RDDL sanner2010relational, (Right) A Multi-Agent Market Simulator byrd2019abides.
Figure 2: Privacy Level 1: Obscure PII
Figure 3: Privacy Level 2: Obscure PII + noise
Figure 4: Privacy Level 3: Generative modeling. The question mark suggests the possibility of reverse-engineering the data.
Figure 5: Privacy Level 4: Generative modeling + testing
...and 14 more figures

Synthetic Data Applications in Finance

TL;DR

Abstract

Synthetic Data Applications in Finance

Authors

TL;DR

Abstract

Table of Contents

Figures (19)