Research and application of artificial intelligence based webshell detection model: A literature review
Mingrui Ma, Lansheng Han, Chunjie Zhou
TL;DR
Webshell detection is challenged by obfuscation and data diversity, requiring AI-driven methods to distinguish malicious scripts from normal data. The paper surveys three development stages and classifies methods by data type and model family, highlighting a shift from manual feature engineering toward deep learning and transformer-based, Code-related representations. Key contributions include a synthesis of data representations, vectorization strategies, and the evolution of models (machine learning, deep learning, hybrids), as well as critical issues like data quality and dataset availability. The work anticipates future directions such as few-shot, federated, continual learning, and the use of LLMs and graph-based paradigms to enhance scalability and robustness.
Abstract
Webshell, as the "culprit" behind numerous network attacks, is one of the research hotspots in the field of cybersecurity. However, the complexity, stealthiness, and confusing nature of webshells pose significant challenges to the corresponding detection schemes. With the rise of Artificial Intelligence (AI) technology, researchers have started to apply different intelligent algorithms and neural network architectures to the task of webshell detection. However, the related research still lacks a systematic and standardized methodological process, which is confusing and redundant. Therefore, following the development timeline, we carefully summarize the progress of relevant research in this field, dividing it into three stages: Start Stage, Initial Development Stage, and In-depth Development Stage. We further elaborate on the main characteristics and core algorithms of each stage. In addition, we analyze the pain points and challenges that still exist in this field and predict the future development trend of this field from our point of view. To the best of our knowledge, this is the first review that details the research related to AI-based webshell detection. It is also hoped that this paper can provide detailed technical information for more researchers interested in AI-based webshell detection tasks.
