A Split-Window Transformer for Multi-Model Sequence Spammer Detection using Multi-Model Variational Autoencoder
Zhou Yang, Yucai Pang, Hongbo Yin, Yunpeng Xiao
TL;DR
The paper tackles multi-modal spammer detection over ultra-long historical sequences. It introduces MS$^2$Dformer, a Transformer backbone that combines MVAE-based two-channel tokenization of multi-modal user history with a hierarchical split-window attention mechanism (SW-MHA) to efficiently model ultra-long sequences. The model spans four stages: MVAE-based tokenization, intra-window SW-MHA and inter-window W-MHA for short- and long-term dependencies, deeper sequence feature mining, and a classifier head, trained with a total loss that fuses MVAE reconstruction losses with cross-entropy. Empirical results on Weibo datasets show state-of-the-art accuracy and efficiency, validating the approach as a strong backbone for real-world multi-modal sequence spammer detection and highlighting its potential for other ultra-long sequence tasks.
Abstract
This paper introduces a new Transformer, called MS$^2$Dformer, that can be used as a generalized backbone for multi-modal sequence spammer detection. Spammer detection is a complex multi-modal task, thus the challenges of applying Transformer are two-fold. Firstly, complex multi-modal noisy information about users can interfere with feature mining. Secondly, the long sequence of users' historical behaviors also puts a huge GPU memory pressure on the attention computation. To solve these problems, we first design a user behavior Tokenization algorithm based on the multi-modal variational autoencoder (MVAE). Subsequently, a hierarchical split-window multi-head attention (SW/W-MHA) mechanism is proposed. The split-window strategy transforms the ultra-long sequences hierarchically into a combination of intra-window short-term and inter-window overall attention. Pre-trained on the public datasets, MS$^2$Dformer's performance far exceeds the previous state of the art. The experiments demonstrate MS$^2$Dformer's ability to act as a backbone.
