Table of Contents
Fetching ...

ServeFlow: A Fast-Slow Model Architecture for Network Traffic Analysis

Shinan Liu, Ted Shaowang, Gerry Wan, Jeewon Chae, Jonatas Marques, Sanjay Krishnan, Nick Feamster

TL;DR

ServeFlow is a solution for machine-learning model serving aimed at network traffic analysis tasks, which carefully selects the number of packets to collect and the models to apply for individual flows to achieve a balance between minimal latency, high service rate, and high accuracy.

Abstract

Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The temporal nature of network flows limits simple scale-out approaches leveraged in other high-traffic machine learning applications. Accordingly, this paper presents ServeFlow, a solution for machine-learning model serving aimed at network traffic analysis tasks, which carefully selects the number of packets to collect and the models to apply for individual flows to achieve a balance between minimal latency, high service rate, and high accuracy. We identify that on the same task, inference time across models can differ by 1.8x - 141.3x, while the inter-packet waiting time is up to 6-8 orders of magnitude higher than the inference time! Based on these insights, we tailor a novel fast-slow model architecture for networking ML pipelines. Flows are assigned to a slower model only when the inferences from the fast model are deemed high uncertainty. ServeFlow is able to make inferences on 76.3% of flows in under 16ms, which is a speed-up of 40.5x on the median end-to-end serving latency while increasing the service rate and maintaining similar accuracy. Even with thousands of features per flow, it achieves a service rate of over 48.5k new flows per second on a 16-core CPU commodity server, which matches the order of magnitude of flow rates observed on city-level network backbones.

ServeFlow: A Fast-Slow Model Architecture for Network Traffic Analysis

TL;DR

ServeFlow is a solution for machine-learning model serving aimed at network traffic analysis tasks, which carefully selects the number of packets to collect and the models to apply for individual flows to achieve a balance between minimal latency, high service rate, and high accuracy.

Abstract

Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The temporal nature of network flows limits simple scale-out approaches leveraged in other high-traffic machine learning applications. Accordingly, this paper presents ServeFlow, a solution for machine-learning model serving aimed at network traffic analysis tasks, which carefully selects the number of packets to collect and the models to apply for individual flows to achieve a balance between minimal latency, high service rate, and high accuracy. We identify that on the same task, inference time across models can differ by 1.8x - 141.3x, while the inter-packet waiting time is up to 6-8 orders of magnitude higher than the inference time! Based on these insights, we tailor a novel fast-slow model architecture for networking ML pipelines. Flows are assigned to a slower model only when the inferences from the fast model are deemed high uncertainty. ServeFlow is able to make inferences on 76.3% of flows in under 16ms, which is a speed-up of 40.5x on the median end-to-end serving latency while increasing the service rate and maintaining similar accuracy. Even with thousands of features per flow, it achieves a service rate of over 48.5k new flows per second on a 16-core CPU commodity server, which matches the order of magnitude of flow rates observed on city-level network backbones.
Paper Structure (34 sections, 13 figures, 10 tables, 2 algorithms)

This paper contains 34 sections, 13 figures, 10 tables, 2 algorithms.

Figures (13)

  • Figure 1: Networking ML pipeline.
  • Figure 2: Traffic analysis holds a natural tradeoff between latency and accuracy. ServeFlow intelligently reassigns predictions to a slow and more accurate model (waits for more packets as features or a more sophisticated design) or keep a fast but less accurate model (most efficient and makes predictions on the 1st packet) to infer. The chart shows the efficacy of the request assignment algorithm compared to an oracle that assigns models based on ground-truth knowledge on the correctness of classification, and a random assignment. This plot is derived using the service recognition dataset.
  • Figure 3: Flow collection time (since the arrival of the first packet, in ms) across applications. Note that some flows do not have five (or more) packets, so at the tail, flow collection time would be smaller compared to collecting two packets.
  • Figure 4: ServeFlow, which supports a novel fast-slow model serving architecture, is tailored to the discrete arrival of network flow packets. The arrow width represents the proportion of flows processed at each stage.
  • Figure 5: The F1 score vs. End-to-end latency (in millisecond) across 5 models for the service recognition dataset. We use the Pareto Front to determine the placement of models in ServeFlow.
  • ...and 8 more figures