siForest: Detecting Network Anomalies with Set-Structured Isolation Forest

Christie Djidjev

siForest: Detecting Network Anomalies with Set-Structured Isolation Forest

Christie Djidjev

TL;DR

This work tackles anomaly detection in set-structured network scan data by introducing siForest, a Set-Partitioned Isolation Forest that preserves IP-level groupings and uses IP-based aggregation for anomaly scoring. It evaluates siForest against standard iForest variants using two preprocessing strategies—flattening and summarization—on synthetic data reflecting realistic Censys-like scans. The results show siForest offers robust, cross-type performance, particularly excelling where port-service relationships are crucial, while preprocessing choices influence performance for specific anomaly types. The study demonstrates siForest's potential as a practical tool for attack-surface identification in cybersecurity contexts, with future directions including real-world validation and integration with graph-based techniques.

Abstract

As cyber threats continue to evolve in sophistication and scale, the ability to detect anomalous network behavior has become critical for maintaining robust cybersecurity defenses. Modern cybersecurity systems face the overwhelming challenge of analyzing billions of daily network interactions to identify potential threats, making efficient and accurate anomaly detection algorithms crucial for network defense. This paper investigates the use of variations of the Isolation Forest (iForest) machine learning algorithm for detecting anomalies in internet scan data. In particular, it presents the Set-Partitioned Isolation Forest (siForest), a novel extension of the iForest method designed to detect anomalies in set-structured data. By treating instances such as sets of multiple network scans with the same IP address as cohesive units, siForest effectively addresses some challenges of analyzing complex, multidimensional datasets. Extensive experiments on synthetic datasets simulating diverse anomaly scenarios in network traffic demonstrate that siForest has the potential to outperform traditional approaches on some types of internet scan data.

siForest: Detecting Network Anomalies with Set-Structured Isolation Forest

TL;DR

Abstract

siForest: Detecting Network Anomalies with Set-Structured Isolation Forest

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)