Table of Contents
Fetching ...

Distributed Incast Detection in Data Center Networks

Yiming Zheng, Haoran Qi, Lirui Yu, Zhan Shu, Qing Zhao

TL;DR

The paper addresses incast in data center networks and the limitations of queue-threshold detectors. It introduces DIDIE, a distributed switch-level incast detector that uses a sequential hypothesis test to identify incast from the first arriving packet, enabling fast per-flow decisions. By modeling regular and incast traffic with separate inter-arrival distributions and deriving an optimal inter-arrival threshold $\epsilon$ through ROC-based linear-cost optimization, the method achieves accurate detection with minimal delay and learns key parameters via EWMA. ns-3 experiments show significant improvements in detection speed and accuracy over queue-length baselines, including 0% false positives in real-world traffic, highlighting the method’s practical potential for incast-aware pacing and congestion control.

Abstract

Incast traffic in data centers can lead to severe performance degradation, such as packet loss and increased latency. Effectively addressing incast requires prompt and accurate detection. Existing solutions, including MA-ECN, BurstRadar and Pulser, typically rely on fixed thresholds of switch port egress queue lengths or their gradients to identify microburst caused by incast flows. However, these queue length related methods often suffer from delayed detection and high error rates. In this study, we propose a distributed incast detection method for data center networks at the switch-level, leveraging a probabilistic hypothesis test with an optimal detection threshold. By analyzing the arrival intervals of new flows, our algorithm can immediately determine if a flow is part of an incast traffic from its initial packet. The experimental results demonstrate that our method offers significant improvements over existing approaches in both detection speed and inference accuracy.

Distributed Incast Detection in Data Center Networks

TL;DR

The paper addresses incast in data center networks and the limitations of queue-threshold detectors. It introduces DIDIE, a distributed switch-level incast detector that uses a sequential hypothesis test to identify incast from the first arriving packet, enabling fast per-flow decisions. By modeling regular and incast traffic with separate inter-arrival distributions and deriving an optimal inter-arrival threshold through ROC-based linear-cost optimization, the method achieves accurate detection with minimal delay and learns key parameters via EWMA. ns-3 experiments show significant improvements in detection speed and accuracy over queue-length baselines, including 0% false positives in real-world traffic, highlighting the method’s practical potential for incast-aware pacing and congestion control.

Abstract

Incast traffic in data centers can lead to severe performance degradation, such as packet loss and increased latency. Effectively addressing incast requires prompt and accurate detection. Existing solutions, including MA-ECN, BurstRadar and Pulser, typically rely on fixed thresholds of switch port egress queue lengths or their gradients to identify microburst caused by incast flows. However, these queue length related methods often suffer from delayed detection and high error rates. In this study, we propose a distributed incast detection method for data center networks at the switch-level, leveraging a probabilistic hypothesis test with an optimal detection threshold. By analyzing the arrival intervals of new flows, our algorithm can immediately determine if a flow is part of an incast traffic from its initial packet. The experimental results demonstrate that our method offers significant improvements over existing approaches in both detection speed and inference accuracy.

Paper Structure

This paper contains 13 sections, 29 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: A Temporal Representation of Regular and Incast Traffic
  • Figure 2: A 4-to-4 Dumbbell Topology
  • Figure 3: ROC curves for various values of $\lambda_{11}$.
  • Figure 4: Cost vs. threshold for various values of $\lambda_{11}$.