Table of Contents
Fetching ...

LLMs are One-Shot URL Classifiers and Explainers

Fariza Rashid, Nishavi Ranaweera, Ben Doyle, Suranga Seneviratne

TL;DR

This work introduces a one-shot LLM-based framework for phishing URL classification that uses Chain-of-Thought prompting to produce both a label and a natural language explanation. Evaluated on three URL datasets with five state-of-the-art LLMs, the approach achieves performance close to fully supervised URL classifiers, with GPT-4 Turbo consistently leading (average F1 ≈ $0.92$). The authors quantify explanation quality via alignment with LIME post-hoc indicators and G-Eval metrics (readability, coherence, informativeness), showing strong explainability for top models and variable performance for others. Extended analyses demonstrate robustness in zero-/few-shot settings and highlight practical benefits, notably better cross-dataset generalisation than traditional supervised methods and the potential for user-friendly, explainable phishing warnings in real-world deployments.

Abstract

Malicious URL classification represents a crucial aspect of cyber security. Although existing work comprises numerous machine learning and deep learning-based URL classification models, most suffer from generalisation and domain-adaptation issues arising from the lack of representative training datasets. Furthermore, these models fail to provide explanations for a given URL classification in natural human language. In this work, we investigate and demonstrate the use of Large Language Models (LLMs) to address this issue. Specifically, we propose an LLM-based one-shot learning framework that uses Chain-of-Thought (CoT) reasoning to predict whether a given URL is benign or phishing. We evaluate our framework using three URL datasets and five state-of-the-art LLMs and show that one-shot LLM prompting indeed provides performances close to supervised models, with GPT 4-Turbo being the best model, followed by Claude 3 Opus. We conduct a quantitative analysis of the LLM explanations and show that most of the explanations provided by LLMs align with the post-hoc explanations of the supervised classifiers, and the explanations have high readability, coherency, and informativeness.

LLMs are One-Shot URL Classifiers and Explainers

TL;DR

This work introduces a one-shot LLM-based framework for phishing URL classification that uses Chain-of-Thought prompting to produce both a label and a natural language explanation. Evaluated on three URL datasets with five state-of-the-art LLMs, the approach achieves performance close to fully supervised URL classifiers, with GPT-4 Turbo consistently leading (average F1 ≈ ). The authors quantify explanation quality via alignment with LIME post-hoc indicators and G-Eval metrics (readability, coherence, informativeness), showing strong explainability for top models and variable performance for others. Extended analyses demonstrate robustness in zero-/few-shot settings and highlight practical benefits, notably better cross-dataset generalisation than traditional supervised methods and the potential for user-friendly, explainable phishing warnings in real-world deployments.

Abstract

Malicious URL classification represents a crucial aspect of cyber security. Although existing work comprises numerous machine learning and deep learning-based URL classification models, most suffer from generalisation and domain-adaptation issues arising from the lack of representative training datasets. Furthermore, these models fail to provide explanations for a given URL classification in natural human language. In this work, we investigate and demonstrate the use of Large Language Models (LLMs) to address this issue. Specifically, we propose an LLM-based one-shot learning framework that uses Chain-of-Thought (CoT) reasoning to predict whether a given URL is benign or phishing. We evaluate our framework using three URL datasets and five state-of-the-art LLMs and show that one-shot LLM prompting indeed provides performances close to supervised models, with GPT 4-Turbo being the best model, followed by Claude 3 Opus. We conduct a quantitative analysis of the LLM explanations and show that most of the explanations provided by LLMs align with the post-hoc explanations of the supervised classifiers, and the explanations have high readability, coherency, and informativeness.
Paper Structure (24 sections, 1 equation, 8 figures, 7 tables)

This paper contains 24 sections, 1 equation, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Example URL classification prompt and the output
  • Figure 2: LLM-based One-Shot URL Classification Framework
  • Figure 3: Prompting the LLM to list benign and phishing indicators identified in the self-explanation
  • Figure 4: Using the G-Eval Framework to assess the quality of LLM self-explainations
  • Figure 5: Cumulative distribution of the Jaccard similarity between LIME and LLM indicators - HP Dataset
  • ...and 3 more figures