Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails

Malte Josten; Torben Weis

Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails

Malte Josten, Torben Weis

TL;DR

The paper addresses the vulnerability of Bayesian spam filters to LLM-modified emails by deploying a pipeline that rewrites spam content using GPT-3.5 Turbo and evaluates SpamAssassin’s detection performance. It introduces metrics for success rate and semantic similarity, comparing LLM-modified content to a dictionary-replacement baseline. Key findings show SpamAssassin misclassifies up to $73.7\%$ of LLM-modified spam, with overall ham conversion reaching $95.8\%$ of original spam, while the dictionary attack remains weak at $0.4\%$, all at a cost of $0.17$ cents per email. The work highlights significant vulnerabilities in current spam filtering and emphasizes the need for improved defenses, broader datasets, and evaluation across multiple LLMs and configurations.

Abstract

Spam and phishing remain critical threats in cybersecurity, responsible for nearly 90% of security incidents. As these attacks grow in sophistication, the need for robust defensive mechanisms intensifies. Bayesian spam filters, like the widely adopted open-source SpamAssassin, are essential tools in this fight. However, the emergence of large language models (LLMs) such as ChatGPT presents new challenges. These models are not only powerful and accessible, but also inexpensive to use, raising concerns about their misuse in crafting sophisticated spam emails that evade traditional spam filters. This work aims to evaluate the robustness and effectiveness of SpamAssassin against LLM-modified email content. We developed a pipeline to test this vulnerability. Our pipeline modifies spam emails using GPT-3.5 Turbo and assesses SpamAssassin's ability to classify these modified emails correctly. The results show that SpamAssassin misclassified up to 73.7% of LLM-modified spam emails as legitimate. In contrast, a simpler dictionary-replacement attack showed a maximum success rate of only 0.4%. These findings highlight the significant threat posed by LLM-modified spam, especially given the cost-efficiency of such attacks (0.17 cents per email). This paper provides crucial insights into the vulnerabilities of current spam filters and the need for continuous improvement in cybersecurity measures.

Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails

TL;DR

of LLM-modified spam, with overall ham conversion reaching

of original spam, while the dictionary attack remains weak at

, all at a cost of

cents per email. The work highlights significant vulnerabilities in current spam filtering and emphasizes the need for improved defenses, broader datasets, and evaluation across multiple LLMs and configurations.

Abstract

Paper Structure (9 sections, 4 figures, 4 tables)

This paper contains 9 sections, 4 figures, 4 tables.

Introduction
Related Work
Experimental Setup
Method
Pre-processing
LLM Modification
Robustness Evaluation
Evaluation
Conclusion

Figures (4)

Figure 1: To rephrase a spam email and test it's resulting classification, the pipeline communicates with various components to (a) modify the spam email body, (b) send the email via SMTP to the mail server, and (d) retrieve the classification label through the Mailpit API. With the help of SpamAssasin, (c) Mailpit labels the email as either ham or spam.
Figure 2: Pipeline to (1) pre-process spam emails extracted from a spam dataset, (2) hand them over to the OpenAI API to be modified by the LLM, and (3) evaluate the spam filters robustness against the rephrased email bodies.
Figure 3: Word cloud with reasons given by GPT-3.5-turbo-0125 on why it did not process the API request.
Figure 4: Cosine similarity for the datasets (a) original and (b) minimal.

Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails

TL;DR

Abstract

Investigating the Effectiveness of Bayesian Spam Filters in Detecting LLM-modified Spam Mails

Authors

TL;DR

Abstract

Table of Contents

Figures (4)