Detecting Zero-Day Web Attacks with an Ensemble of LSTM, GRU, and Stacked Autoencoders
Vahid Babaey, Hamid Reza Faragardi
TL;DR
This work tackles zero-day web attack detection by learning the distribution of normal web requests and flagging deviations with a one-class ensemble comprising LSTM autoencoders, GRU autoencoders, and a stacked autoencoder. A novel tokenization strategy converts requests into structured numeric sequences and a latent fusion process concatenates and compresses the sub-model representations to produce a robust anomaly score. The approach delivers high metrics, including $97.58\%$ accuracy, $97.52\%$ recall, $99.76\%$ specificity, $99.99\%$ precision, and a remarkably low $0.2\%$ false positive rate, highlighting its potential for practical deployment in web security. By training solely on normal traffic and focusing on anomaly detection, the method promises reliable detection of unseen threats while mitigating false alarms, with future work aiming to generalize across datasets and extend capabilities to attack-type classification.
Abstract
The rapid growth in web-based services has significantly increased security risks related to user information, as web-based attacks become increasingly sophisticated and prevalent. Traditional security methods frequently struggle to detect previously unknown (zero-day) web attacks, putting sensitive user data at significant risk. Additionally, reducing human intervention in web security tasks can minimize errors and enhance reliability. This paper introduces an intelligent system designed to detect zero-day web attacks using a novel one-class ensemble method consisting of three distinct autoencoder architectures: LSTM autoencoder, GRU autoencoder, and stacked autoencoder. Our approach employs a novel tokenization strategy to convert normal web requests into structured numeric sequences, enabling the ensemble model to effectively identify anomalous activities by uniquely concatenating and compressing the latent representations from each autoencoder. The proposed method efficiently detects unknown web attacks while effectively addressing common limitations of previous methods, such as high memory consumption and excessive false positive rates. Extensive experimental evaluations demonstrate the superiority of our proposed ensemble, achieving remarkable detection metrics: 97.58% accuracy, 97.52% recall, 99.76% specificity, and 99.99% precision, with an exceptionally low false positive rate of 0.2%. These results underscore our method's significant potential in enhancing real-world web security through accurate and reliable detection of web-based attacks.
