Table of Contents
Fetching ...

Poster: Long PHP webshell files detection based on sliding window attention

Zhiqiang Wang, Haoyu Wang, Lu Hao

TL;DR

This work tackles webshell detection in PHP by shifting from opcode single-tuples to Opcode Double-Tuples (ODTs) and integrating CodeBert with FastText embeddings. A sliding window attention mechanism enables efficient processing of long opcode sequences, addressing memory and contextual challenges in long scripts. Empirically, the method achieves 99.2% accuracy and 99.1% F1, outperforming several baselines, and demonstrates that ODTs capture richer low-level features than OSTs. The approach offers a practical, scalable solution for robust webshell detection and sets the stage for multi-language generalization.

Abstract

Webshell is a type of backdoor, and web applications are widely exposed to webshell injection attacks. Therefore, it is important to study webshell detection techniques. In this study, we propose a webshell detection method. We first convert PHP source code to opcodes and then extract Opcode Double-Tuples (ODTs). Next, we combine CodeBert and FastText models for feature representation and classification. To address the challenge that deep learning methods have difficulty detecting long webshell files, we introduce a sliding window attention mechanism. This approach effectively captures malicious behavior within long files. Experimental results show that our method reaches high accuracy in webshell detection, solving the problem of traditional methods that struggle to address new webshell variants and anti-detection techniques.

Poster: Long PHP webshell files detection based on sliding window attention

TL;DR

This work tackles webshell detection in PHP by shifting from opcode single-tuples to Opcode Double-Tuples (ODTs) and integrating CodeBert with FastText embeddings. A sliding window attention mechanism enables efficient processing of long opcode sequences, addressing memory and contextual challenges in long scripts. Empirically, the method achieves 99.2% accuracy and 99.1% F1, outperforming several baselines, and demonstrates that ODTs capture richer low-level features than OSTs. The approach offers a practical, scalable solution for robust webshell detection and sets the stage for multi-language generalization.

Abstract

Webshell is a type of backdoor, and web applications are widely exposed to webshell injection attacks. Therefore, it is important to study webshell detection techniques. In this study, we propose a webshell detection method. We first convert PHP source code to opcodes and then extract Opcode Double-Tuples (ODTs). Next, we combine CodeBert and FastText models for feature representation and classification. To address the challenge that deep learning methods have difficulty detecting long webshell files, we introduce a sliding window attention mechanism. This approach effectively captures malicious behavior within long files. Experimental results show that our method reaches high accuracy in webshell detection, solving the problem of traditional methods that struggle to address new webshell variants and anti-detection techniques.

Paper Structure

This paper contains 5 sections, 1 figure.

Figures (1)

  • Figure 1: Overview of the detection method.