LLMs are One-Shot URL Classifiers and Explainers

Fariza Rashid; Nishavi Ranaweera; Ben Doyle; Suranga Seneviratne

LLMs are One-Shot URL Classifiers and Explainers

Fariza Rashid, Nishavi Ranaweera, Ben Doyle, Suranga Seneviratne

TL;DR

This work introduces a one-shot LLM-based framework for phishing URL classification that uses Chain-of-Thought prompting to produce both a label and a natural language explanation. Evaluated on three URL datasets with five state-of-the-art LLMs, the approach achieves performance close to fully supervised URL classifiers, with GPT-4 Turbo consistently leading (average F1 ≈ $0.92$). The authors quantify explanation quality via alignment with LIME post-hoc indicators and G-Eval metrics (readability, coherence, informativeness), showing strong explainability for top models and variable performance for others. Extended analyses demonstrate robustness in zero-/few-shot settings and highlight practical benefits, notably better cross-dataset generalisation than traditional supervised methods and the potential for user-friendly, explainable phishing warnings in real-world deployments.

Abstract

Malicious URL classification represents a crucial aspect of cyber security. Although existing work comprises numerous machine learning and deep learning-based URL classification models, most suffer from generalisation and domain-adaptation issues arising from the lack of representative training datasets. Furthermore, these models fail to provide explanations for a given URL classification in natural human language. In this work, we investigate and demonstrate the use of Large Language Models (LLMs) to address this issue. Specifically, we propose an LLM-based one-shot learning framework that uses Chain-of-Thought (CoT) reasoning to predict whether a given URL is benign or phishing. We evaluate our framework using three URL datasets and five state-of-the-art LLMs and show that one-shot LLM prompting indeed provides performances close to supervised models, with GPT 4-Turbo being the best model, followed by Claude 3 Opus. We conduct a quantitative analysis of the LLM explanations and show that most of the explanations provided by LLMs align with the post-hoc explanations of the supervised classifiers, and the explanations have high readability, coherency, and informativeness.

LLMs are One-Shot URL Classifiers and Explainers

TL;DR

). The authors quantify explanation quality via alignment with LIME post-hoc indicators and G-Eval metrics (readability, coherence, informativeness), showing strong explainability for top models and variable performance for others. Extended analyses demonstrate robustness in zero-/few-shot settings and highlight practical benefits, notably better cross-dataset generalisation than traditional supervised methods and the potential for user-friendly, explainable phishing warnings in real-world deployments.

Abstract

Paper Structure (24 sections, 1 equation, 8 figures, 7 tables)

This paper contains 24 sections, 1 equation, 8 figures, 7 tables.

Introduction
Related Work
Phishing URL Detection
Few-shot classification using LLMs
Explainability of LLMs as classifiers
Our Framework
Experiment Settings
Datasets
Accuracy comparison with supervised URL classifiers
Quality evaluation of LLM Self-explanations
Alignment between LLM and LIME indicators
G-Eval Framework
Results
Prediction performance of one-shot LLM URL classifiers
Quality of LLM outputs
...and 9 more sections

Figures (8)

Figure 1: Example URL classification prompt and the output
Figure 2: LLM-based One-Shot URL Classification Framework
Figure 3: Prompting the LLM to list benign and phishing indicators identified in the self-explanation
Figure 4: Using the G-Eval Framework to assess the quality of LLM self-explainations
Figure 5: Cumulative distribution of the Jaccard similarity between LIME and LLM indicators - HP Dataset
...and 3 more figures

LLMs are One-Shot URL Classifiers and Explainers

TL;DR

Abstract

LLMs are One-Shot URL Classifiers and Explainers

Authors

TL;DR

Abstract

Table of Contents

Figures (8)