Table of Contents
Fetching ...

Advancing SQL Injection Detection for High-Speed Data Centers: A Novel Approach Using Cascaded NLP

Kasim Tasdemir, Rafiullah Khan, Fahad Siddiqui, Sakir Sezer, Fatih Kurugollu, Sena Busra Yengec-Tasdemir, Alperen Bolat

TL;DR

This work introduces a novel cascade SQLi detection method, blending classical and transformer-based NLP models, achieving a 99.86% detection accuracy with significantly lower computational demands-20 times faster than using transformer-based models alone.

Abstract

Detecting SQL Injection (SQLi) attacks is crucial for web-based data center security, but it is challenging to balance accuracy and computational efficiency, especially in high-speed networks. Traditional methods struggle with this balance, while NLP-based approaches, although accurate, are computationally intensive. We introduce a novel cascade SQLi detection method, blending classical and transformer-based NLP models, achieving a 99.86% detection accuracy with significantly lower computational demands-20 times faster than using transformer-based models alone. Our approach is tested in a realistic setting and compared with 35 other methods, including Machine Learning-based and transformer models like BERT, on a dataset of over 30,000 SQL sentences. Our results show that this hybrid method effectively detects SQLi in high-traffic environments, offering efficient and accurate protection against SQLi vulnerabilities with computational efficiency. The code is available at https://github.com/gdrlab/cascaded-sqli-detection .

Advancing SQL Injection Detection for High-Speed Data Centers: A Novel Approach Using Cascaded NLP

TL;DR

This work introduces a novel cascade SQLi detection method, blending classical and transformer-based NLP models, achieving a 99.86% detection accuracy with significantly lower computational demands-20 times faster than using transformer-based models alone.

Abstract

Detecting SQL Injection (SQLi) attacks is crucial for web-based data center security, but it is challenging to balance accuracy and computational efficiency, especially in high-speed networks. Traditional methods struggle with this balance, while NLP-based approaches, although accurate, are computationally intensive. We introduce a novel cascade SQLi detection method, blending classical and transformer-based NLP models, achieving a 99.86% detection accuracy with significantly lower computational demands-20 times faster than using transformer-based models alone. Our approach is tested in a realistic setting and compared with 35 other methods, including Machine Learning-based and transformer models like BERT, on a dataset of over 30,000 SQL sentences. Our results show that this hybrid method effectively detects SQLi in high-traffic environments, offering efficient and accurate protection against SQLi vulnerabilities with computational efficiency. The code is available at https://github.com/gdrlab/cascaded-sqli-detection .
Paper Structure (18 sections, 11 equations, 11 figures, 3 tables)

This paper contains 18 sections, 11 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: The high-level structure of the proposed cascade SQLi attack detection method, as inspired by Viola et al. viola2001rapid, is illustrated. In the first stage, the method effectively captures over 99.73% of potential attacks (recall rate) with a significantly low computational burden, thereby enabling efficient detection of SQLi attacks. Subsequently, the suspicious SQL payloads are passed to the second stage for further investigation. The transformer-based model employed in the second stage conducts a re-analysis of the SQL payloads, thereby mitigating possible false alarms. This two-stage cascade system enhances the overall detection speed by a factor of 20×, while maintaining a high detection accuracy, as demonstrated by an F1 score of 0.9981. Notably, the model utilised in the first stage can be dynamically replaced based on the introduced FE score, thereby enabling adaptive and optimised detection performance.
  • Figure 2: A comparative analysis of all methods is conducted based on the introduced F1 efficiency (FE) score, considering two scenarios: (a) when $\alpha=1.0$, and (b) when $\alpha=0.98$. The findings reveal that the proposed method emerges as the most favourable choice when prioritising 98%-2% importance to the F1 score and inference latency, respectively. When the FE score is adjusted from 1.00 to 0.98, the Transformer-based models shift towards the lower end of the list due to their higher inference latencies. In scenarios where computational resources are limited and emphasis is placed on speed and accuracy (i.e., 2% and 98% importance, respectively), the proposed method demonstrates superior performance. The method categories and their order within the group, as denoted by colours and shades, respectively, are consistent with those presented in Fig. \ref{['fig:all-F1-vs-inference']}.
  • Figure 3: SQL injection attack. Benign users provide their original usernames and passwords. A hacker will try SQL injection statements such as "or" "=" that is always TRUE and will retrieve all rows from the 'Users' table in the database.
  • Figure 4: The single NLP process which is combining classical ML classifiers with various features given in Eq.\ref{['eq:features-list']}. (TF-IDF based and Bag of char/word based features) The SQL payload is first tokenized and cleaned in the pre-processing phase. In the next stage, features are extracted with the selected method. Then the features are used in the selected model's training.
  • Figure 5: Proposed ensemble approach 1. Each classifier type is used with multiple features.
  • ...and 6 more figures