Dynamic Cluster Analysis to Detect and Track Novelty in Network Telescopes
Kai Huang, Luca Gioacchini, Marco Mellia, Luca Vassio
TL;DR
This work tackles dynamic novelty discovery in network telescope traffic by introducing a three-stage pipeline: (1) self-supervised sender embeddings via i-DarkVec, (2) unsupervised clustering of embeddings with HDBSCAN, and (3) Dynamic Cluster Analysis (DCA) that uses a MONIC-inspired framework with an overlap metric $OL(X,Y)=|X\cap Y|/|X|$ and thresholds $\tau_0=0.65$, $\tau_1=0.3$ to track cluster evolution and identify novelties across daily snapshots. Applied to 20 days of telescope data with over 100 million packets from ~785k senders, the method detects 50-70 clusters per day, re-identifies 60-70% of them with past activity, and highlights 10-20 newly emerged clusters daily, significantly easing analyst workload. The approach yields 216 coordinated activities with 37.5% labeled and demonstrates robustness through examples of Shadowserver and Censys patterns, as well as various novelty cases such as SIP/RDP scanners in Google Cloud and GRE DDoS backscatter. Overall, the framework provides a scalable, interpretable mechanism to monitor evolving cyber coordination patterns and surface novel threats for timely human review, with planned extensions toward real-time deployment and richer feature integration.
Abstract
In the context of cybersecurity, tracking the activities of coordinated hosts over time is a daunting task because both participants and their behaviours evolve at a fast pace. We address this scenario by solving a dynamic novelty discovery problem with the aim of both re-identifying patterns seen in the past and highlighting new patterns. We focus on traffic collected by Network Telescopes, a primary and noisy source for cybersecurity analysis. We propose a 3-stage pipeline: (i) we learn compact representations (embeddings) of hosts through their traffic in a self-supervised fashion; (ii) via clustering, we distinguish groups of hosts performing similar activities; (iii) we track the cluster temporal evolution to highlight novel patterns. We apply our methodology to 20 days of telescope traffic during which we observe more than 8 thousand active hosts. Our results show that we efficiently identify 50-70 well-shaped clusters per day, 60-70% of which we associate with already analysed cases, while we pinpoint 10-20 previously unseen clusters per day. These correspond to activity changes and new incidents, of which we document some. In short, our novelty discovery methodology enormously simplifies the manual analysis the security analysts have to conduct to gain insights to interpret novel coordinated activities.
