Table of Contents
Fetching ...

Fiaingen: A financial time series generative method matching real-world data quality

Jože M. Rožanec, Tina Žezlin, Laurentiu Vasiliu, Dunja Mladenić, Radu Prodan, Dumitru Roman

TL;DR

This work tackles data scarcity in financial machine learning by introducing Fiaingen, a graph-based generative framework that transforms time series into visibility graphs (NVG, HVG, and multigraphs) and synthesizes new sequences via graph-based generation. Across realism, downstream task performance, and runtime, Fiaingen-based methods consistently outperform state-of-the-art baselines (TimeGAN, cGAN, STS, DiffusionTS), achieving high latent-space overlap with real data as shown by t-SNE and robust ROC AUC in classification tasks. Crucially, the graph-based approach delivers orders of magnitude faster generation times (seconds to minutes) compared to hours or days for neural baselines, enabling scalable and potentially real-time deployment. The results suggest that leveraging topological time-series representations preserves temporal structure and inter-asset dependencies more effectively, offering practical value for finance applications and future work on rare-event dynamics and scenario labeling.

Abstract

Data is vital in enabling machine learning models to advance research and practical applications in finance, where accurate and robust models are essential for investment and trading decision-making. However, real-world data is limited despite its quantity, quality, and variety. The data shortage of various financial assets directly hinders the performance of machine learning models designed to trade and invest in these assets. Generative methods can mitigate this shortage. In this paper, we introduce a set of novel techniques for time series data generation (we name them Fiaingen) and assess their performance across three criteria: (a) overlap of real-world and synthetic data on a reduced dimensionality space, (b) performance on downstream machine learning tasks, and (c) runtime performance. Our experiments demonstrate that the methods achieve state-of-the-art performance across the three criteria listed above. Synthetic data generated with Fiaingen methods more closely mirrors the original time series data while keeping data generation time close to seconds - ensuring the scalability of the proposed approach. Furthermore, models trained on it achieve performance close to those trained with real-world data.

Fiaingen: A financial time series generative method matching real-world data quality

TL;DR

This work tackles data scarcity in financial machine learning by introducing Fiaingen, a graph-based generative framework that transforms time series into visibility graphs (NVG, HVG, and multigraphs) and synthesizes new sequences via graph-based generation. Across realism, downstream task performance, and runtime, Fiaingen-based methods consistently outperform state-of-the-art baselines (TimeGAN, cGAN, STS, DiffusionTS), achieving high latent-space overlap with real data as shown by t-SNE and robust ROC AUC in classification tasks. Crucially, the graph-based approach delivers orders of magnitude faster generation times (seconds to minutes) compared to hours or days for neural baselines, enabling scalable and potentially real-time deployment. The results suggest that leveraging topological time-series representations preserves temporal structure and inter-asset dependencies more effectively, offering practical value for finance applications and future work on rare-event dynamics and scenario labeling.

Abstract

Data is vital in enabling machine learning models to advance research and practical applications in finance, where accurate and robust models are essential for investment and trading decision-making. However, real-world data is limited despite its quantity, quality, and variety. The data shortage of various financial assets directly hinders the performance of machine learning models designed to trade and invest in these assets. Generative methods can mitigate this shortage. In this paper, we introduce a set of novel techniques for time series data generation (we name them Fiaingen) and assess their performance across three criteria: (a) overlap of real-world and synthetic data on a reduced dimensionality space, (b) performance on downstream machine learning tasks, and (c) runtime performance. Our experiments demonstrate that the methods achieve state-of-the-art performance across the three criteria listed above. Synthetic data generated with Fiaingen methods more closely mirrors the original time series data while keeping data generation time close to seconds - ensuring the scalability of the proposed approach. Furthermore, models trained on it achieve performance close to those trained with real-world data.

Paper Structure

This paper contains 17 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Diagram detailing the generative process considering NVG.
  • Figure 2: t-SNE visualizations comparing real vs. synthetic time series for each generative strategy (window size = 20). Blue: real, Yellow: synthetic.
  • Figure 3: t-SNE visualizations comparing real vs. synthetic time series for NVG and HVG generative strategies (window size = 60). Blue: real, Yellow: synthetic.