Mitigating Model Drift in Developing Economies Using Synthetic Data and Outliers
Ilyas Varshavskiy, Bonu Boboeva, Shuhrat Khalilbekov, Azizjon Azimi, Sergey Shulgin, Akhlitdin Nizamitdinov, Haitz Sáez de Ocáriz Borde
TL;DR
The study addresses model drift in financial ML under abrupt macroeconomic shocks, focusing on developing economies. It introduces a two-level drift-evaluation framework with Distribution Shift (DS) and stability metrics Stabilization Score (SS) and Stabilization Uplift (SU), and demonstrates that incorporating synthetic outliers via zGAN can improve post-shock stability on macroeconomic tabular data. Key contributions include formalizing DS/SS/SU, showing that small, dataset-dependent shares of synthetic outliers often boost stability, and validating the approach on private datasets from Tajikistan, Uzbekistan, Kazakhstan, Jordan, and Azerbaijan. This data-centric stabilization strategy offers a practical path to more robust risk decisions in volatile economies and highlights the role of model architecture and data augmentation in resilience to shocks.
Abstract
Machine Learning models in finance are highly susceptible to model drift, where predictive performance declines as data distributions shift. This issue is especially acute in developing economies such as those in Central Asia and the Caucasus - including Tajikistan, Uzbekistan, Kazakhstan, and Azerbaijan - where frequent and unpredictable macroeconomics shocks destabilize financial data. To the best of our knowledge, this is among the first studies to examine drift mitigation methods on financial datasets from these regions. We investigate the use of synthetic outliers, a largely unexplored approach, to improve model stability against unforeseen shocks. To evaluate effectiveness, we introduce a two-level framework that measures both the extent of performance degradation and the severity of shocks. Our experiments on macroeconomic tabular datasets show that adding a small proportion of synthetic outliers generally improves stability compared to baseline models, though the optimal amount varies by dataset and model
