LLMs are One-Shot URL Classifiers and Explainers
Fariza Rashid, Nishavi Ranaweera, Ben Doyle, Suranga Seneviratne
TL;DR
This work introduces a one-shot LLM-based framework for phishing URL classification that uses Chain-of-Thought prompting to produce both a label and a natural language explanation. Evaluated on three URL datasets with five state-of-the-art LLMs, the approach achieves performance close to fully supervised URL classifiers, with GPT-4 Turbo consistently leading (average F1 ≈ $0.92$). The authors quantify explanation quality via alignment with LIME post-hoc indicators and G-Eval metrics (readability, coherence, informativeness), showing strong explainability for top models and variable performance for others. Extended analyses demonstrate robustness in zero-/few-shot settings and highlight practical benefits, notably better cross-dataset generalisation than traditional supervised methods and the potential for user-friendly, explainable phishing warnings in real-world deployments.
Abstract
Malicious URL classification represents a crucial aspect of cyber security. Although existing work comprises numerous machine learning and deep learning-based URL classification models, most suffer from generalisation and domain-adaptation issues arising from the lack of representative training datasets. Furthermore, these models fail to provide explanations for a given URL classification in natural human language. In this work, we investigate and demonstrate the use of Large Language Models (LLMs) to address this issue. Specifically, we propose an LLM-based one-shot learning framework that uses Chain-of-Thought (CoT) reasoning to predict whether a given URL is benign or phishing. We evaluate our framework using three URL datasets and five state-of-the-art LLMs and show that one-shot LLM prompting indeed provides performances close to supervised models, with GPT 4-Turbo being the best model, followed by Claude 3 Opus. We conduct a quantitative analysis of the LLM explanations and show that most of the explanations provided by LLMs align with the post-hoc explanations of the supervised classifiers, and the explanations have high readability, coherency, and informativeness.
