Table of Contents
Fetching ...

Superflows: A New Tool for Forensic Network Flow Analysis

Michael Collins, Jyotirmoy V. Deshmukh, Dristi Dinesh, Mukund Raghothaman, Srivatsan Ravi, Yuan Xia

TL;DR

The paper addresses forensic network analysis data deluge by introducing SuperFlows, a formalism that groups NetFlow records into high-level hypotheses $h: 2^{Flow} \to Bool$ to form maximal decompositions of flow sets. It provides an Alloy-like relational language for specifying flow attributes and predicates, along with efficient, linear-time decomposition under subset-closed, monitorable hypotheses. Through two case studies—modern webpage fetches and scans in dark-space traffic—the authors demonstrate substantial data-footprint reductions and improved analyst efficiency, illustrating how multiple superflows can co-exist from the same data. The work outlines future directions, including library-building for various traffic classes, handling vantage and confounders, and extending the framework to temporal patterns and causality to broaden practical impact in operational security.

Abstract

Network security analysts gather data from diverse sources, from high-level summaries of network flow and traffic volumes to low-level details such as service logs from servers and the contents of individual packets. They validate and check this data against traffic patterns and historical indicators of compromise. Based on the results of this analysis, a decision is made to either automatically manage the traffic or report it to an analyst for further investigation. Unfortunately, due rapidly increasing traffic volumes, there are far more events to check than operational teams can handle for effective forensic analysis. However, just as packets are grouped into flows that share a commonality, we argue that a high-level construct for grouping network flows into a set a flows that share a hypothesis is needed to significantly improve the quality of operational network response by increasing Events Per Analysts Hour (EPAH). In this paper, we propose a formalism for describing a superflow construct, which we characterize as an aggregation of one or more flows based on an analyst-specific hypothesis about traffic behavior. We demonstrate simple superflow constructions and representations, and perform a case study to explain how the formalism can be used to reduce the volume of data for forensic analysis.

Superflows: A New Tool for Forensic Network Flow Analysis

TL;DR

The paper addresses forensic network analysis data deluge by introducing SuperFlows, a formalism that groups NetFlow records into high-level hypotheses to form maximal decompositions of flow sets. It provides an Alloy-like relational language for specifying flow attributes and predicates, along with efficient, linear-time decomposition under subset-closed, monitorable hypotheses. Through two case studies—modern webpage fetches and scans in dark-space traffic—the authors demonstrate substantial data-footprint reductions and improved analyst efficiency, illustrating how multiple superflows can co-exist from the same data. The work outlines future directions, including library-building for various traffic classes, handling vantage and confounders, and extending the framework to temporal patterns and causality to broaden practical impact in operational security.

Abstract

Network security analysts gather data from diverse sources, from high-level summaries of network flow and traffic volumes to low-level details such as service logs from servers and the contents of individual packets. They validate and check this data against traffic patterns and historical indicators of compromise. Based on the results of this analysis, a decision is made to either automatically manage the traffic or report it to an analyst for further investigation. Unfortunately, due rapidly increasing traffic volumes, there are far more events to check than operational teams can handle for effective forensic analysis. However, just as packets are grouped into flows that share a commonality, we argue that a high-level construct for grouping network flows into a set a flows that share a hypothesis is needed to significantly improve the quality of operational network response by increasing Events Per Analysts Hour (EPAH). In this paper, we propose a formalism for describing a superflow construct, which we characterize as an aggregation of one or more flows based on an analyst-specific hypothesis about traffic behavior. We demonstrate simple superflow constructions and representations, and perform a case study to explain how the formalism can be used to reduce the volume of data for forensic analysis.
Paper Structure (15 sections, 6 equations, 6 figures)

This paper contains 15 sections, 6 equations, 6 figures.

Figures (6)

  • Figure 1: The Basic Footprints for NetFlow as collected by the router or through PCAP
  • Figure 2: Discrete Sites Contacted to Fetch CNN's Homepage
  • Figure 3: Footprints for Modern Website representation
  • Figure 4: Footprints for Full and Allotted Scan-256
  • Figure 5: Flow Reduction for Full Scan-256 Superflow
  • ...and 1 more figures

Theorems & Definitions (4)

  • Example 1: Chat session hypothesis
  • Example 2: Webpage fetch hypothesis
  • Claim 3: Efficient hypothesis monitoring
  • Claim 4: Superflow decomposition