Superflows: A New Tool for Forensic Network Flow Analysis

Michael Collins; Jyotirmoy V. Deshmukh; Dristi Dinesh; Mukund Raghothaman; Srivatsan Ravi; Yuan Xia

Superflows: A New Tool for Forensic Network Flow Analysis

Michael Collins, Jyotirmoy V. Deshmukh, Dristi Dinesh, Mukund Raghothaman, Srivatsan Ravi, Yuan Xia

TL;DR

The paper addresses forensic network analysis data deluge by introducing SuperFlows, a formalism that groups NetFlow records into high-level hypotheses $h: 2^{Flow} \to Bool$ to form maximal decompositions of flow sets. It provides an Alloy-like relational language for specifying flow attributes and predicates, along with efficient, linear-time decomposition under subset-closed, monitorable hypotheses. Through two case studies—modern webpage fetches and scans in dark-space traffic—the authors demonstrate substantial data-footprint reductions and improved analyst efficiency, illustrating how multiple superflows can co-exist from the same data. The work outlines future directions, including library-building for various traffic classes, handling vantage and confounders, and extending the framework to temporal patterns and causality to broaden practical impact in operational security.

Abstract

Network security analysts gather data from diverse sources, from high-level summaries of network flow and traffic volumes to low-level details such as service logs from servers and the contents of individual packets. They validate and check this data against traffic patterns and historical indicators of compromise. Based on the results of this analysis, a decision is made to either automatically manage the traffic or report it to an analyst for further investigation. Unfortunately, due rapidly increasing traffic volumes, there are far more events to check than operational teams can handle for effective forensic analysis. However, just as packets are grouped into flows that share a commonality, we argue that a high-level construct for grouping network flows into a set a flows that share a hypothesis is needed to significantly improve the quality of operational network response by increasing Events Per Analysts Hour (EPAH). In this paper, we propose a formalism for describing a superflow construct, which we characterize as an aggregation of one or more flows based on an analyst-specific hypothesis about traffic behavior. We demonstrate simple superflow constructions and representations, and perform a case study to explain how the formalism can be used to reduce the volume of data for forensic analysis.

Superflows: A New Tool for Forensic Network Flow Analysis

TL;DR

The paper addresses forensic network analysis data deluge by introducing SuperFlows, a formalism that groups NetFlow records into high-level hypotheses

to form maximal decompositions of flow sets. It provides an Alloy-like relational language for specifying flow attributes and predicates, along with efficient, linear-time decomposition under subset-closed, monitorable hypotheses. Through two case studies—modern webpage fetches and scans in dark-space traffic—the authors demonstrate substantial data-footprint reductions and improved analyst efficiency, illustrating how multiple superflows can co-exist from the same data. The work outlines future directions, including library-building for various traffic classes, handling vantage and confounders, and extending the framework to temporal patterns and causality to broaden practical impact in operational security.

Abstract

Paper Structure (15 sections, 6 equations, 6 figures)

This paper contains 15 sections, 6 equations, 6 figures.

Introduction
Motivation and Related Work
Superflow Decompositions
A Language for Superflow Hypotheses
Attributes and predicates over flows.
Relational constraints over multiple flows.
Efficiently Decomposing Flow Streams
Superflow-guided Data Reduction
Estimating Data Footprints
Modern Webpage Analysis
Scan Analysis
Discussion and Future Directions
Building new superflow class libraries.
Vantage and confounders.
Expanding the scope of superflow constructs.

Figures (6)

Figure 1: The Basic Footprints for NetFlow as collected by the router or through PCAP
Figure 2: Discrete Sites Contacted to Fetch CNN's Homepage
Figure 3: Footprints for Modern Website representation
Figure 4: Footprints for Full and Allotted Scan-256
Figure 5: Flow Reduction for Full Scan-256 Superflow
...and 1 more figures

Theorems & Definitions (4)

Example 1: Chat session hypothesis
Example 2: Webpage fetch hypothesis
Claim 3: Efficient hypothesis monitoring
Claim 4: Superflow decomposition

Superflows: A New Tool for Forensic Network Flow Analysis

TL;DR

Abstract

Superflows: A New Tool for Forensic Network Flow Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (4)