PhishGuard: A Convolutional Neural Network Based Model for Detecting Phishing URLs with Explainability Analysis
Md Robiul Islam, Md Mahamodul Islam, Mst. Suraiya Afrin, Anika Antara, Nujhat Tabassum, Al Amin
TL;DR
This work tackles phishing URL detection by introducing a fine-tuned 1D convolutional neural network trained on a large, feature-rich dataset with 21 URL-derived features. It combines high predictive performance, achieving 99.85% accuracy and 99.80% F1, with SHAP-based explainability to reveal which URL features drive classifications. The dataset blends PhishTank phishing URLs with legitimate URLs from URL2016D41, and the model architecture emphasizes interpretability through global and local explanations. The approach demonstrates strong practical potential for real-time, transparent phishing detection and contributes to standards in dataset use and explainable cybersecurity models.
Abstract
Cybersecurity is one of the global issues because of the extensive dependence on cyber systems of individuals, industries, and organizations. Among the cyber attacks, phishing is increasing tremendously and affecting the global economy. Therefore, this phenomenon highlights the vital need for enhancing user awareness and robust support at both individual and organizational levels. Phishing URL identification is the best way to address the problem. Various machine learning and deep learning methods have been proposed to automate the detection of phishing URLs. However, these approaches often need more convincing accuracy and rely on datasets consisting of limited samples. Furthermore, these black box intelligent models decision to detect suspicious URLs needs proper explanation to understand the features affecting the output. To address the issues, we propose a 1D Convolutional Neural Network (CNN) and trained the model with extensive features and a substantial amount of data. The proposed model outperforms existing works by attaining an accuracy of 99.85%. Additionally, our explainability analysis highlights certain features that significantly contribute to identifying the phishing URL.
