Synthetic Data Applications in Finance
Vamsi K. Potluru, Daniel Borrajo, Andrea Coletta, Niccolò Dalmasso, Yousef El-Laham, Elizabeth Fons, Mohsen Ghassemi, Sriram Gopalakrishnan, Vikesh Gosai, Eleonora Kreačić, Ganapathy Mani, Saheed Obitayo, Deepak Paramanand, Natraj Raman, Mikhail Solonin, Srijan Sood, Svitlana Vyetrenko, Haibei Zhu, Manuela Veloso, Tucker Balch
TL;DR
This paper surveys synthetic data applications in finance across multiple data modalities, including tabular, time-series, event-series, and unstructured formats, and emphasizes regulatory and privacy considerations. It surveys generation techniques, from model-based simulators like ABIDES to neural generators such as CTGAN and TimeGAN, and proposes a privacy-level framework to guide safe deployment. The work highlights metrics for fidelity, utility, and privacy, and discusses data-liberation, augmentation, and counterfactual testing as core use-cases, illustrated by fraud detection, marketing journeys, and market-simulation case studies. It concludes with open challenges and directions, underscoring the potential of synthetic data to enable robust testing, safer data sharing, and improved decision-making in finance while acknowledging regulatory, ethical, and practical hurdles.
Abstract
Synthetic data has made tremendous strides in various commercial settings including finance, healthcare, and virtual reality. We present a broad overview of prototypical applications of synthetic data in the financial sector and in particular provide richer details for a few select ones. These cover a wide variety of data modalities including tabular, time-series, event-series, and unstructured arising from both markets and retail financial applications. Since finance is a highly regulated industry, synthetic data is a potential approach for dealing with issues related to privacy, fairness, and explainability. Various metrics are utilized in evaluating the quality and effectiveness of our approaches in these applications. We conclude with open directions in synthetic data in the context of the financial domain.
