Dynamic Cluster Analysis to Detect and Track Novelty in Network Telescopes

Kai Huang; Luca Gioacchini; Marco Mellia; Luca Vassio

Dynamic Cluster Analysis to Detect and Track Novelty in Network Telescopes

Kai Huang, Luca Gioacchini, Marco Mellia, Luca Vassio

TL;DR

This work tackles dynamic novelty discovery in network telescope traffic by introducing a three-stage pipeline: (1) self-supervised sender embeddings via i-DarkVec, (2) unsupervised clustering of embeddings with HDBSCAN, and (3) Dynamic Cluster Analysis (DCA) that uses a MONIC-inspired framework with an overlap metric $OL(X,Y)=|X\cap Y|/|X|$ and thresholds $\tau_0=0.65$, $\tau_1=0.3$ to track cluster evolution and identify novelties across daily snapshots. Applied to 20 days of telescope data with over 100 million packets from ~785k senders, the method detects 50-70 clusters per day, re-identifies 60-70% of them with past activity, and highlights 10-20 newly emerged clusters daily, significantly easing analyst workload. The approach yields 216 coordinated activities with 37.5% labeled and demonstrates robustness through examples of Shadowserver and Censys patterns, as well as various novelty cases such as SIP/RDP scanners in Google Cloud and GRE DDoS backscatter. Overall, the framework provides a scalable, interpretable mechanism to monitor evolving cyber coordination patterns and surface novel threats for timely human review, with planned extensions toward real-time deployment and richer feature integration.

Abstract

In the context of cybersecurity, tracking the activities of coordinated hosts over time is a daunting task because both participants and their behaviours evolve at a fast pace. We address this scenario by solving a dynamic novelty discovery problem with the aim of both re-identifying patterns seen in the past and highlighting new patterns. We focus on traffic collected by Network Telescopes, a primary and noisy source for cybersecurity analysis. We propose a 3-stage pipeline: (i) we learn compact representations (embeddings) of hosts through their traffic in a self-supervised fashion; (ii) via clustering, we distinguish groups of hosts performing similar activities; (iii) we track the cluster temporal evolution to highlight novel patterns. We apply our methodology to 20 days of telescope traffic during which we observe more than 8 thousand active hosts. Our results show that we efficiently identify 50-70 well-shaped clusters per day, 60-70% of which we associate with already analysed cases, while we pinpoint 10-20 previously unseen clusters per day. These correspond to activity changes and new incidents, of which we document some. In short, our novelty discovery methodology enormously simplifies the manual analysis the security analysts have to conduct to gain insights to interpret novel coordinated activities.

Dynamic Cluster Analysis to Detect and Track Novelty in Network Telescopes

TL;DR

and thresholds

to track cluster evolution and identify novelties across daily snapshots. Applied to 20 days of telescope data with over 100 million packets from ~785k senders, the method detects 50-70 clusters per day, re-identifies 60-70% of them with past activity, and highlights 10-20 newly emerged clusters daily, significantly easing analyst workload. The approach yields 216 coordinated activities with 37.5% labeled and demonstrates robustness through examples of Shadowserver and Censys patterns, as well as various novelty cases such as SIP/RDP scanners in Google Cloud and GRE DDoS backscatter. Overall, the framework provides a scalable, interpretable mechanism to monitor evolving cyber coordination patterns and surface novel threats for timely human review, with planned extensions toward real-time deployment and richer feature integration.

Abstract

Paper Structure (15 sections, 1 equation, 8 figures, 2 tables)

This paper contains 15 sections, 1 equation, 8 figures, 2 tables.

Introduction
Network Traffic Analysis Pipeline
Network Telescope Sensor
Self-Supervised Upstream Task: Sender Embeddings
Downtream Task: Unsupervised Clustering
Dynamic Clusters Analysis
Ground Truth for Testing
Experimental Results
Generic Clustering Results
Overall Results of DCA
Example of Labelled Cluster Evolution
Novelty Examples
Detailed Examples
Other Examples
Conclusions

Figures (8)

Figure 1: 3-stage pipeline for dynamic novelty discovery.
Figure 2: Daily senders on the 20-day dataset.
Figure 3: ECCDF of number of active days per sender on the 20-day dataset. Only 20% of the senders are active for 3 days or more.
Figure 4: Statistics of clustering results.
Figure 5: Dynamic cluster analysis.
...and 3 more figures

Dynamic Cluster Analysis to Detect and Track Novelty in Network Telescopes

TL;DR

Abstract

Dynamic Cluster Analysis to Detect and Track Novelty in Network Telescopes

Authors

TL;DR

Abstract

Table of Contents

Figures (8)