Table of Contents
Fetching ...

Enhancing SQL Injection Detection and Prevention Using Generative Models

Naga Sai Dasari, Atta Badii, Armin Moin, Ahmed Ashlam

TL;DR

This work tackles the evolving challenge of SQL Injection by augmenting limited labeled data with synthetic queries generated by a VAE, U-Net, and CWGAN-GP. The pipeline encodes SQL queries into latent representations, generates diverse synthetic data, and uses pseudo-labelling to create hybrid datasets for training an XGBoost classifier. The results show that synthetic data can significantly improve detection performance and robustness, achieving high accuracy and balanced class performance, especially when optimally combining U-Net and CWGAN-GP outputs. The findings highlight the practical potential of generative-data-driven augmentation for adaptive SQLi detection in real-world web applications.

Abstract

SQL Injection (SQLi) continues to pose a significant threat to the security of web applications, enabling attackers to manipulate databases and access sensitive information without authorisation. Although advancements have been made in detection techniques, traditional signature-based methods still struggle to identify sophisticated SQL injection attacks that evade predefined patterns. As SQLi attacks evolve, the need for more adaptive detection systems becomes crucial. This paper introduces an innovative approach that leverages generative models to enhance SQLi detection and prevention mechanisms. By incorporating Variational Autoencoders (VAE), Conditional Wasserstein GAN with Gradient Penalty (CWGAN-GP), and U-Net, synthetic SQL queries were generated to augment training datasets for machine learning models. The proposed method demonstrated improved accuracy in SQLi detection systems by reducing both false positives and false negatives. Extensive empirical testing further illustrated the ability of the system to adapt to evolving SQLi attack patterns, resulting in enhanced precision and robustness.

Enhancing SQL Injection Detection and Prevention Using Generative Models

TL;DR

This work tackles the evolving challenge of SQL Injection by augmenting limited labeled data with synthetic queries generated by a VAE, U-Net, and CWGAN-GP. The pipeline encodes SQL queries into latent representations, generates diverse synthetic data, and uses pseudo-labelling to create hybrid datasets for training an XGBoost classifier. The results show that synthetic data can significantly improve detection performance and robustness, achieving high accuracy and balanced class performance, especially when optimally combining U-Net and CWGAN-GP outputs. The findings highlight the practical potential of generative-data-driven augmentation for adaptive SQLi detection in real-world web applications.

Abstract

SQL Injection (SQLi) continues to pose a significant threat to the security of web applications, enabling attackers to manipulate databases and access sensitive information without authorisation. Although advancements have been made in detection techniques, traditional signature-based methods still struggle to identify sophisticated SQL injection attacks that evade predefined patterns. As SQLi attacks evolve, the need for more adaptive detection systems becomes crucial. This paper introduces an innovative approach that leverages generative models to enhance SQLi detection and prevention mechanisms. By incorporating Variational Autoencoders (VAE), Conditional Wasserstein GAN with Gradient Penalty (CWGAN-GP), and U-Net, synthetic SQL queries were generated to augment training datasets for machine learning models. The proposed method demonstrated improved accuracy in SQLi detection systems by reducing both false positives and false negatives. Extensive empirical testing further illustrated the ability of the system to adapt to evolving SQLi attack patterns, resulting in enhanced precision and robustness.

Paper Structure

This paper contains 33 sections, 16 equations, 22 figures, 1 table.

Figures (22)

  • Figure 1: Pipeline Architecture
  • Figure 2: Accuracy and Training Time by Embedding Method
  • Figure 3: VAE Architecture for SQL Query Encoding
  • Figure 4: Training and Validation Loss during VAE Training
  • Figure 5: U-Net Model Architecture for SQL Query Generation
  • ...and 17 more figures