Table of Contents
Fetching ...

Efficient Phishing URL Detection Using Graph-based Machine Learning and Loopy Belief Propagation

Wenye Guo, Qun Wang, Hao Yue, Haijian Sun, Rose Qingyang Hu

TL;DR

This work tackles phishing URL detection by integrating URL structural features with stable network-level indicators (IP addresses and authoritative NS records) in a heterogeneous graph. It leverages Loopy Belief Propagation with a refined edge-potential mechanism and a novel convergence strategy that removes unknown cycles to ensure stable inference, yielding strong classification performance. Key contributions include (1) combining URL and network features for resilience to adversarial evasion, (2) dynamic edge potentials informed by entity similarity, (3) a convergence technique that accelerates and stabilizes inference, and (4) extensive reproducible experiments showing high $F_1$-scores on real-world datasets, including up to $98.77$ in large-scale settings. The approach demonstrates scalable, robust phishing detection with practical impact for security systems and data-driven threat intelligence.

Abstract

The proliferation of mobile devices and online interactions have been threatened by different cyberattacks, where phishing attacks and malicious Uniform Resource Locators (URLs) pose significant risks to user security. Traditional phishing URL detection methods primarily rely on URL string-based features, which attackers often manipulate to evade detection. To address these limitations, we propose a novel graph-based machine learning model for phishing URL detection, integrating both URL structure and network-level features such as IP addresses and authoritative name servers. Our approach leverages Loopy Belief Propagation (LBP) with an enhanced convergence strategy to enable effective message passing and stable classification in the presence of complex graph structures. Additionally, we introduce a refined edge potential mechanism that dynamically adapts based on entity similarity and label relationships to further improve classification accuracy. Comprehensive experiments on real-world datasets demonstrate our model's effectiveness by achieving F1 score of up to 98.77\%. This robust and reproducible method advances phishing detection capabilities, offering enhanced reliability and valuable insights in the field of cybersecurity.

Efficient Phishing URL Detection Using Graph-based Machine Learning and Loopy Belief Propagation

TL;DR

This work tackles phishing URL detection by integrating URL structural features with stable network-level indicators (IP addresses and authoritative NS records) in a heterogeneous graph. It leverages Loopy Belief Propagation with a refined edge-potential mechanism and a novel convergence strategy that removes unknown cycles to ensure stable inference, yielding strong classification performance. Key contributions include (1) combining URL and network features for resilience to adversarial evasion, (2) dynamic edge potentials informed by entity similarity, (3) a convergence technique that accelerates and stabilizes inference, and (4) extensive reproducible experiments showing high -scores on real-world datasets, including up to in large-scale settings. The approach demonstrates scalable, robust phishing detection with practical impact for security systems and data-driven threat intelligence.

Abstract

The proliferation of mobile devices and online interactions have been threatened by different cyberattacks, where phishing attacks and malicious Uniform Resource Locators (URLs) pose significant risks to user security. Traditional phishing URL detection methods primarily rely on URL string-based features, which attackers often manipulate to evade detection. To address these limitations, we propose a novel graph-based machine learning model for phishing URL detection, integrating both URL structure and network-level features such as IP addresses and authoritative name servers. Our approach leverages Loopy Belief Propagation (LBP) with an enhanced convergence strategy to enable effective message passing and stable classification in the presence of complex graph structures. Additionally, we introduce a refined edge potential mechanism that dynamically adapts based on entity similarity and label relationships to further improve classification accuracy. Comprehensive experiments on real-world datasets demonstrate our model's effectiveness by achieving F1 score of up to 98.77\%. This robust and reproducible method advances phishing detection capabilities, offering enhanced reliability and valuable insights in the field of cybersecurity.
Paper Structure (16 sections, 3 equations, 7 figures, 8 tables)

This paper contains 16 sections, 3 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: The overall workflow of the proposed method.
  • Figure 2: URL anatomy diagram.
  • Figure 3: Elbow method saturation point plot.
  • Figure 4: The workflow of convergence approach.
  • Figure 5: Data distribution visualization.
  • ...and 2 more figures