Enhancing SQL Injection Detection and Prevention Using Generative Models
Naga Sai Dasari, Atta Badii, Armin Moin, Ahmed Ashlam
TL;DR
This work tackles the evolving challenge of SQL Injection by augmenting limited labeled data with synthetic queries generated by a VAE, U-Net, and CWGAN-GP. The pipeline encodes SQL queries into latent representations, generates diverse synthetic data, and uses pseudo-labelling to create hybrid datasets for training an XGBoost classifier. The results show that synthetic data can significantly improve detection performance and robustness, achieving high accuracy and balanced class performance, especially when optimally combining U-Net and CWGAN-GP outputs. The findings highlight the practical potential of generative-data-driven augmentation for adaptive SQLi detection in real-world web applications.
Abstract
SQL Injection (SQLi) continues to pose a significant threat to the security of web applications, enabling attackers to manipulate databases and access sensitive information without authorisation. Although advancements have been made in detection techniques, traditional signature-based methods still struggle to identify sophisticated SQL injection attacks that evade predefined patterns. As SQLi attacks evolve, the need for more adaptive detection systems becomes crucial. This paper introduces an innovative approach that leverages generative models to enhance SQLi detection and prevention mechanisms. By incorporating Variational Autoencoders (VAE), Conditional Wasserstein GAN with Gradient Penalty (CWGAN-GP), and U-Net, synthetic SQL queries were generated to augment training datasets for machine learning models. The proposed method demonstrated improved accuracy in SQLi detection systems by reducing both false positives and false negatives. Extensive empirical testing further illustrated the ability of the system to adapt to evolving SQLi attack patterns, resulting in enhanced precision and robustness.
