Table of Contents
Fetching ...

HMCGeo: IP Region Prediction Based on Hierarchical Multi-label Classification

Tianzi Zhao, Xinran Liu, Zhaoxin Zhang, Dong Zhao, Ning Li, Zhichao Zhang, Xinye Wang

TL;DR

A novel IP region prediction framework HMCGeo is introduced, framing IP region prediction as a hierarchical multi-label classification problem, which significantly outperforms existing methods in region prediction across all granularities and achieves lower coordinate errors on most samples by similarity-weighted averaging of candidate region centers.

Abstract

Fine-grained IP geolocation plays a critical role in applications such as location-based services and cybersecurity. Most existing fine-grained IP geolocation methods are regression-based; however, due to noise in the input data, these methods typically encounter kilometer-level prediction errors and provide incorrect region information for users. To address this issue, this paper proposes a novel hierarchical multi-label classification framework for IP region prediction, named HMCGeo. This framework treats IP geolocation as a hierarchical multi-label classification problem and employs residual connection-based feature extraction and attention prediction units to predict the target host region across multiple geographical granularities. Furthermore, we introduce probabilistic classification loss during training, combining it with hierarchical cross-entropy loss to form a composite loss function. This approach optimizes predictions by utilizing hierarchical constraints between regions at different granularities. IP region prediction experiments on the New York, Los Angeles, and Shanghai datasets demonstrate that HMCGeo achieves superior performance across all geographical granularities, significantly outperforming existing IP geolocation methods.

HMCGeo: IP Region Prediction Based on Hierarchical Multi-label Classification

TL;DR

A novel IP region prediction framework HMCGeo is introduced, framing IP region prediction as a hierarchical multi-label classification problem, which significantly outperforms existing methods in region prediction across all granularities and achieves lower coordinate errors on most samples by similarity-weighted averaging of candidate region centers.

Abstract

Fine-grained IP geolocation plays a critical role in applications such as location-based services and cybersecurity. Most existing fine-grained IP geolocation methods are regression-based; however, due to noise in the input data, these methods typically encounter kilometer-level prediction errors and provide incorrect region information for users. To address this issue, this paper proposes a novel hierarchical multi-label classification framework for IP region prediction, named HMCGeo. This framework treats IP geolocation as a hierarchical multi-label classification problem and employs residual connection-based feature extraction and attention prediction units to predict the target host region across multiple geographical granularities. Furthermore, we introduce probabilistic classification loss during training, combining it with hierarchical cross-entropy loss to form a composite loss function. This approach optimizes predictions by utilizing hierarchical constraints between regions at different granularities. IP region prediction experiments on the New York, Los Angeles, and Shanghai datasets demonstrate that HMCGeo achieves superior performance across all geographical granularities, significantly outperforming existing IP geolocation methods.

Paper Structure

This paper contains 25 sections, 17 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of IP Geolocation Methods.
  • Figure 2: Three typical examples with fewer than 2 clusters
  • Figure 3: Number of clusters more than 1
  • Figure 4: The architecture of HMCGeo.
  • Figure 5: Comparison of geolocation performance between HMCGeo and RIPGeo, where figures a, b, c and figures d, e, f represent the geolocation performance on the New York, Los Angeles, and Shanghai datasets for target aggregation and target dispersion, respectively.
  • ...and 2 more figures