An Innovative Information Theory-based Approach to Tackle and Enhance The Transparency in Phishing Detection

Van Nguyen; Tingmin Wu; Xingliang Yuan; Marthie Grobler; Surya Nepal; Carsten Rudolph

An Innovative Information Theory-based Approach to Tackle and Enhance The Transparency in Phishing Detection

Van Nguyen, Tingmin Wu, Xingliang Yuan, Marthie Grobler, Surya Nepal, Carsten Rudolph

TL;DR

The paper tackles the explainability problem in phishing detection by introducing AI2TALE, an information-theory–driven framework that localizes phishing-relevant content at the sentence level. It leverages mutual information and an information bottleneck objective to jointly train a selector and a classifier in a weakly supervised setting, using a differentiable Gumbel-Softmax relaxation for discrete sentence selection. Evaluated on seven real-world email datasets against intrinsic interpretable baselines, AI2TALE achieves consistent improvements in the combined metrics of Label-Accuracy and Cognitive-True-Positive by about 1.5%–3.5% and demonstrates higher alignment with human cognitive triggers (SAC). The approach delivers concise, human-interpretable explanations by highlighting the most influential sentence per email, thereby enhancing practical phishing defense with improved transparency and actionable insights.

Abstract

Phishing attacks have become a serious and challenging issue for detection, explanation, and defense. Despite more than a decade of research on phishing, encompassing both technical and non-technical remedies, phishing continues to be a serious problem. Nowadays, AI-based phishing detection stands out as one of the most effective solutions for defending against phishing attacks by providing vulnerability (i.e., phishing or benign) predictions for the data. However, it lacks explainability in terms of providing comprehensive interpretations for the predictions, such as identifying the specific information that causes the data to be classified as phishing. To this end, we propose an innovative deep learning-based approach for email (the most common phishing way) phishing attack localization. Our method can not only predict the vulnerability of the email data but also automatically learn and figure out the most important and phishing-relevant information (i.e., sentences) in the phishing email data where the selected information indicates useful and concise explanations for the vulnerability. The rigorous experiments on seven real-world diverse email datasets show the effectiveness and advancement of our proposed method in selecting crucial information, offering concise explanations (by successfully figuring out the most important and phishing-relevant information) for the vulnerability of the phishing email data. Particularly, our method achieves a significantly higher performance, ranging from approximately 1.5% to 3.5%, compared to state-of-the-art baselines, as measured by the combined average performance of two main metrics Label-Accuracy and Cognitive-True-Positive.

An Innovative Information Theory-based Approach to Tackle and Enhance The Transparency in Phishing Detection

TL;DR

Abstract

Paper Structure (45 sections, 13 equations, 3 figures, 6 tables)

This paper contains 45 sections, 13 equations, 3 figures, 6 tables.

Introduction
Related Work
Phishing attack detection
Phishing attack localization
The proposed approach
The problem statement
Methodology
Learning to select the important and phishing-relevant information and the training principle
Phishing-relevant information selection process
Reparameterization for continuous optimization
Mutual information for guiding the selection process
Benefits as well as potential weaknesses of the mutual information training principle and our innovative solutions
Obtaining a superset of phishing-relevant sentences
Encoding the vulnerability label via its selections instead of via truly meaningful information
A summary of our AI2TALE method
...and 30 more sections

Figures (3)

Figure 1: A visualization of our proposed AI2TALE method for solving the phishing attack localization problem.
Figure 2: Human evaluation on the importance of the top-$1$ selected information (i.e., a sentence) from each email (by our AI2TALE method) in affecting and persuading users to follow the instructions from the email. We evaluate the selected sentences of 10 different phishing emails (randomly chosen from the testing set).
Figure 3: An architecture of a simple deep neural network in a supervised learning context for the classification problem.

An Innovative Information Theory-based Approach to Tackle and Enhance The Transparency in Phishing Detection

TL;DR

Abstract

An Innovative Information Theory-based Approach to Tackle and Enhance The Transparency in Phishing Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (3)