Hypersparse Traffic Matrices from Suricata Network Flows using GraphBLAS
Michael Houle, Michael Jones, Dan Wallmeyer, Risa Brodeur, Justin Burr, Hayden Jananthan, Sam Merrell, Peter Michaleas, Anthony Perez, Andrew Prout, Jeremy Kepner
TL;DR
Addresses the challenge of scalable analysis of hypersparse network traffic by constructing traffic matrices from Suricata flow records with GraphBLAS. The approach uses a json2grb pipeline to convert flow JSON into directed edge counts within a ${2^{32}}\times{2^{32}}$ address space, with anonymization performed via CryptoPAN and batching windows of $2^{17}$ packets. Key contributions include a low-memory ingestion workflow, TAR-archived batches, and LZ4-compressed storage of hypersparse matrices, with GraphBLAS routines handling the heavy lifting efficiently. The results show high throughput and modest memory/storage requirements, supporting practical deployment of privacy-preserving, sensor-based traffic analysis.
Abstract
Hypersparse traffic matrices constructed from network packet source and destination addresses is a powerful tool for gaining insights into network traffic. SuiteSparse: GraphBLAS, an open source package or building, manipulating, and analyzing large hypersparse matrices, is one approach to constructing these traffic matrices. Suricata is a widely used open source network intrusion detection software package. This work demonstrates how Suricata network flow records can be used to efficiently construct hypersparse matrices using GraphBLAS.
