ASINT: Learning AS-to-Organization Mapping from Internet Metadata
Yongzhe Xu, Weitong Li, Eeshan Umrani, Taejoong Chung
TL;DR
ASINT tackles the problem of accurately mapping Autonomous System Numbers (ASNs) to real organizations, a task hampered by fragmented registries and evolving corporate structures. It fuses WHOIS and PeeringDB signals with curated open-web evidence and applies retrieval-augmented NLP to infer two relation types—aliases and directed parent–child ties—followed by a conservative second-pass validation and an operator-in-the-loop feedback mechanism. At scale, ASINT maps 112,172 ASNs into 82,840 organization families, increasing cross-RIR unifications and multi-AS groupings, with high validation metrics (precision 0.9608, recall 0.9915, accuracy 0.9752) and practical operator engagement (595 reports across 106 organizations, 6 errors). The approach improves downstream measurement and security analyses (e.g., +27.5% intra-organization RPKI misconfigurations detected, -9.4% benign hijack alerts, -5.9% corrections where IP leasing was misidentified) and emphasizes an operator-in-the-loop, auditable, and refreshable data pipeline that remains robust as base models evolve.
Abstract
Accurate AS-to-organization mapping underpins Internet measurement and security, yet registries are fragmented, PeeringDB is narrow, and routing views reflect connectivity rather than ownership. We take a pragmatic step: ASINT integrates curated web evidence with retrieval-guided LLM techniques and strict, evidence-cited validation to infer two relations (aliases and directed parent-child) and then revalidates them conservatively. To keep the dataset sustainable, we operate a public dashboard and API where operators can inspect per-ASN evidence and submit feedback that seeds refreshes. At scale, ASINT maps 112,172 ASNs into 82,840 organization families and, on overlapping AS sets, yields fewer, larger families with 21-24% more multi-AS groups than prior datasets (i.e., CAIDA AS2Org [11], AS2ORG+ [4], AS-Sibling [10], and Borges [28]). Quality is high in practice: ASINT achieves a precision of 0.9608, a recall of 0.9915 and an accuracy of 0.9752 under manual validation. Public deployment further drew operator-submitted reports for 595 ASNs across 106 organizations, with only 6 errors (99.0% observed clustering accuracy), with feedback coming from network operators across all RIR regions. Better organization context improves downstream analyses: +27.5% intra-organization RPKI misconfiguration detections, -9.4% benign hijack alerts, and -5.9% corrections to cases mislabeled as IP leasing. We release code, datasets, and the operator platform with APIs; given persistent ambiguity in organizational names and the continual evolution of corporate structures, an operator-in-the-loop process is essential; the platform records per ASN feedback with provenance and incorporates it into periodic refreshes and retraining. The methodology is model-agnostic and stands to improve further as base LLMs advance.
