Table of Contents
Fetching ...

CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting

Josef Koumar, Karel Hynek, Tomáš Čejka, Pavel Šiška

TL;DR

A comprehensive dataset derived from 40 weeks of traffic transmitted by 275,000 active IP addresses in the CESNET3 network is introduced, reflecting the variability typical of an ISP environment and provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.

Abstract

Anomaly detection in network traffic is crucial for maintaining the security of computer networks and identifying malicious activities. One of the primary approaches to anomaly detection are methods based on forecasting. Nevertheless, extensive real-world network datasets for forecasting and anomaly detection techniques are missing, potentially causing performance overestimation of anomaly detection algorithms. This manuscript addresses this gap by introducing a dataset comprising time series data of network entities' behavior, collected from the CESNET3 network. The dataset was created from 40 weeks of network traffic of 275 thousand active IP addresses. The ISP origin of the presented data ensures a high level of variability among network entities, which forms a unique and authentic challenge for forecasting and anomaly detection models. It provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.

CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting

TL;DR

A comprehensive dataset derived from 40 weeks of traffic transmitted by 275,000 active IP addresses in the CESNET3 network is introduced, reflecting the variability typical of an ISP environment and provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.

Abstract

Anomaly detection in network traffic is crucial for maintaining the security of computer networks and identifying malicious activities. One of the primary approaches to anomaly detection are methods based on forecasting. Nevertheless, extensive real-world network datasets for forecasting and anomaly detection techniques are missing, potentially causing performance overestimation of anomaly detection algorithms. This manuscript addresses this gap by introducing a dataset comprising time series data of network entities' behavior, collected from the CESNET3 network. The dataset was created from 40 weeks of network traffic of 275 thousand active IP addresses. The ISP origin of the presented data ensures a high level of variability among network entities, which forms a unique and authentic challenge for forecasting and anomaly detection models. It provides valuable insights into the practical deployment of forecast-based anomaly detection approaches.
Paper Structure (25 sections, 10 figures, 4 tables)

This paper contains 25 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Topology of the CESNET3 network, which interconnects academic institutions in the Czech Republic
  • Figure 2: Architecture of dataset collection from the CESNET3 network
  • Figure 3: This diagram describes the aggregation process for capturing time series with length $m$ from network traffic. For each packet $Packet_{i,j}$ exists $Flow_{i,k}$ where the packet belongs. Furthermore, it is always true that $a \ge g$ (for $b \ge h$ and others similarly), and in most cases, the $a$ will be much larger than $g$. Moreover, it is common that only one IP flow contains all packets from one connection ($g = 0$). This is common, for example, for connections generated by a user visiting a web page. Similarly, a time series datapoint is a combination of one or more IP flows.
  • Figure 4: The file structure of the CESNET-TimeSeries24 dataset.
  • Figure 5: The evolution of the number of active IP addresses for each dataset's day
  • ...and 5 more figures