Table of Contents
Fetching ...

Linnaeus: A Hierarchical, Multi-Label Framework for Autonomous System Classification

Marcos Piotto, Ignacio Schuemer, Santiago T. Torres, Mariano G. Beiró, Esteban Carisimo, Fabián E. Bustamante

Abstract

Autonomous systems (ASes) play diverse roles in today's Internet, from community and research backbones to hyperscale content providers and submarine-cable operators. However, existing taxonomies based solely on network-level features fail to capture their semantic and operational heterogeneity. In this paper, we present Linnaeus, a hierarchical AS-classification framework that combines network-centric data (e.g., topology, BGP announcements) with rich non-network features and leverages domain-adapted large language models alongside traditional machine-learning techniques. Linnaeus provides a two-level taxonomy with 18 top-level and 38 second-level classes, supports multi-label assignments to reflect hybrid roles (e.g., research backbone and transit provider), and provides an end-to-end pipeline from data ingestion to label inference. On a manually annotated dataset of nearly 2,000 ASes, Linnaeus achieves an overall precision and recall of 0.83 and 0.76, respectively. We further demonstrate its practical value through case studies, highlighting Linnaeus's potential to reveal both structural and semantic dimensions of Internet infrastructure.

Linnaeus: A Hierarchical, Multi-Label Framework for Autonomous System Classification

Abstract

Autonomous systems (ASes) play diverse roles in today's Internet, from community and research backbones to hyperscale content providers and submarine-cable operators. However, existing taxonomies based solely on network-level features fail to capture their semantic and operational heterogeneity. In this paper, we present Linnaeus, a hierarchical AS-classification framework that combines network-centric data (e.g., topology, BGP announcements) with rich non-network features and leverages domain-adapted large language models alongside traditional machine-learning techniques. Linnaeus provides a two-level taxonomy with 18 top-level and 38 second-level classes, supports multi-label assignments to reflect hybrid roles (e.g., research backbone and transit provider), and provides an end-to-end pipeline from data ingestion to label inference. On a manually annotated dataset of nearly 2,000 ASes, Linnaeus achieves an overall precision and recall of 0.83 and 0.76, respectively. We further demonstrate its practical value through case studies, highlighting Linnaeus's potential to reveal both structural and semantic dimensions of Internet infrastructure.
Paper Structure (40 sections, 6 figures, 9 tables)

This paper contains 40 sections, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Linnaeus's taxonomy: top-level and sub-level categories
  • Figure 2: Model architecture for the multilabel tag prediction task.
  • Figure 3: Pipeline for training and evaluation of the top-level model (Fig. \ref{['fig:top_level_pipeline']}) and each sub-level model (Fig. \ref{['fig:sub_level_pipeline']}).
  • Figure 4: Top-level distribution for the 118,695 tagged ASes.
  • Figure 5: Sub-level tags for government and educational, and research networks.
  • ...and 1 more figures