Table of Contents
Fetching ...

PhishGuard: A Convolutional Neural Network Based Model for Detecting Phishing URLs with Explainability Analysis

Md Robiul Islam, Md Mahamodul Islam, Mst. Suraiya Afrin, Anika Antara, Nujhat Tabassum, Al Amin

TL;DR

This work tackles phishing URL detection by introducing a fine-tuned 1D convolutional neural network trained on a large, feature-rich dataset with 21 URL-derived features. It combines high predictive performance, achieving 99.85% accuracy and 99.80% F1, with SHAP-based explainability to reveal which URL features drive classifications. The dataset blends PhishTank phishing URLs with legitimate URLs from URL2016D41, and the model architecture emphasizes interpretability through global and local explanations. The approach demonstrates strong practical potential for real-time, transparent phishing detection and contributes to standards in dataset use and explainable cybersecurity models.

Abstract

Cybersecurity is one of the global issues because of the extensive dependence on cyber systems of individuals, industries, and organizations. Among the cyber attacks, phishing is increasing tremendously and affecting the global economy. Therefore, this phenomenon highlights the vital need for enhancing user awareness and robust support at both individual and organizational levels. Phishing URL identification is the best way to address the problem. Various machine learning and deep learning methods have been proposed to automate the detection of phishing URLs. However, these approaches often need more convincing accuracy and rely on datasets consisting of limited samples. Furthermore, these black box intelligent models decision to detect suspicious URLs needs proper explanation to understand the features affecting the output. To address the issues, we propose a 1D Convolutional Neural Network (CNN) and trained the model with extensive features and a substantial amount of data. The proposed model outperforms existing works by attaining an accuracy of 99.85%. Additionally, our explainability analysis highlights certain features that significantly contribute to identifying the phishing URL.

PhishGuard: A Convolutional Neural Network Based Model for Detecting Phishing URLs with Explainability Analysis

TL;DR

This work tackles phishing URL detection by introducing a fine-tuned 1D convolutional neural network trained on a large, feature-rich dataset with 21 URL-derived features. It combines high predictive performance, achieving 99.85% accuracy and 99.80% F1, with SHAP-based explainability to reveal which URL features drive classifications. The dataset blends PhishTank phishing URLs with legitimate URLs from URL2016D41, and the model architecture emphasizes interpretability through global and local explanations. The approach demonstrates strong practical potential for real-time, transparent phishing detection and contributes to standards in dataset use and explainable cybersecurity models.

Abstract

Cybersecurity is one of the global issues because of the extensive dependence on cyber systems of individuals, industries, and organizations. Among the cyber attacks, phishing is increasing tremendously and affecting the global economy. Therefore, this phenomenon highlights the vital need for enhancing user awareness and robust support at both individual and organizational levels. Phishing URL identification is the best way to address the problem. Various machine learning and deep learning methods have been proposed to automate the detection of phishing URLs. However, these approaches often need more convincing accuracy and rely on datasets consisting of limited samples. Furthermore, these black box intelligent models decision to detect suspicious URLs needs proper explanation to understand the features affecting the output. To address the issues, we propose a 1D Convolutional Neural Network (CNN) and trained the model with extensive features and a substantial amount of data. The proposed model outperforms existing works by attaining an accuracy of 99.85%. Additionally, our explainability analysis highlights certain features that significantly contribute to identifying the phishing URL.
Paper Structure (17 sections, 5 figures, 5 tables)

This paper contains 17 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Proposed Methodology Workflow
  • Figure 2: (a) Training and Testing accuracy per epoch, (b) Training and Testing loss per epoch.
  • Figure 3: Hierarchy of features with respect to their role in classification and Average impact on model output magnitude.
  • Figure 4: SHAP value decision plot.
  • Figure 5: SHAP value waterfall plot.