Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails
Malte Josten, Torben Weis
TL;DR
The paper addresses the vulnerability of Bayesian spam filters to LLM-modified emails by deploying a pipeline that rewrites spam content using GPT-3.5 Turbo and evaluates SpamAssassin’s detection performance. It introduces metrics for success rate and semantic similarity, comparing LLM-modified content to a dictionary-replacement baseline. Key findings show SpamAssassin misclassifies up to $73.7\%$ of LLM-modified spam, with overall ham conversion reaching $95.8\%$ of original spam, while the dictionary attack remains weak at $0.4\%$, all at a cost of $0.17$ cents per email. The work highlights significant vulnerabilities in current spam filtering and emphasizes the need for improved defenses, broader datasets, and evaluation across multiple LLMs and configurations.
Abstract
Spam and phishing remain critical threats in cybersecurity, responsible for nearly 90% of security incidents. As these attacks grow in sophistication, the need for robust defensive mechanisms intensifies. Bayesian spam filters, like the widely adopted open-source SpamAssassin, are essential tools in this fight. However, the emergence of large language models (LLMs) such as ChatGPT presents new challenges. These models are not only powerful and accessible, but also inexpensive to use, raising concerns about their misuse in crafting sophisticated spam emails that evade traditional spam filters. This work aims to evaluate the robustness and effectiveness of SpamAssassin against LLM-modified email content. We developed a pipeline to test this vulnerability. Our pipeline modifies spam emails using GPT-3.5 Turbo and assesses SpamAssassin's ability to classify these modified emails correctly. The results show that SpamAssassin misclassified up to 73.7% of LLM-modified spam emails as legitimate. In contrast, a simpler dictionary-replacement attack showed a maximum success rate of only 0.4%. These findings highlight the significant threat posed by LLM-modified spam, especially given the cost-efficiency of such attacks (0.17 cents per email). This paper provides crucial insights into the vulnerabilities of current spam filters and the need for continuous improvement in cybersecurity measures.
