Table of Contents
Fetching ...

Confucius: Achieving Consistent Low Latency with Practical Queue Management for Real-Time Communications

Zili Meng, Nirav Atre, Mingwei Xu, Justine Sherry, Maria Apostolaki

TL;DR

Confucius addresses the problem of latency spikes faced by real-time traffic when competing Web flows burst onto shared bottlenecks. It introduces an on-router queue-management approach that gradually reduces the real-time flow’s available bandwidth using age-aware exponential weight updates, while classifying flows by buffer occupancy with hysteresis to preserve fairness. The core contributions include (i) age-aware exponential bandwidth reallocation with per-flow EWMA weighting, (ii) occupancy-based, hysteresis-driven flow classification into a small number of queues, and (iii) a kernel-level qdisc implementation with theoretical bounds showing bounded RT stall and bounded Web-flow degradation. Extensive NS-3 and test-bed evaluations demonstrate substantial stall reductions (often >60%) and near-parity Web performance without requiring end-host labels, suggesting practical deployment viability for last-mile routers.

Abstract

Real-time communication applications require consistently low latency, which is often disrupted by latency spikes caused by competing flows, especially Web traffic. We identify the root cause of disruptions in such cases as the mismatch between the abrupt bandwidth allocation adjustment of queue scheduling and gradual congestion window adjustment of congestion control. For example, when a sudden burst of new Web flows arrives, queue schedulers abruptly shift bandwidth away from the existing real-time flow(s). The real-time flow will need several RTTs to converge to the new available bandwidth, during which severe stalls occur. In this paper, we present Confucius, a practical queue management scheme designed for offering real-time traffic with consistently low latency regardless of competing flows. Confucius slows down bandwidth adjustment to match the reaction of congestion control, such that the end host can reduce the sending rate without incurring latency spikes. Importantly, Confucius does not require the collaboration of end-hosts (e.g., labels on packets), nor manual parameter tuning to achieve good performance. Extensive experiments show that Confucius outperforms existing practical queueing schemes by reducing the stall duration by more than 50%, while the competing flows also fairly enjoy on-par performance.

Confucius: Achieving Consistent Low Latency with Practical Queue Management for Real-Time Communications

TL;DR

Confucius addresses the problem of latency spikes faced by real-time traffic when competing Web flows burst onto shared bottlenecks. It introduces an on-router queue-management approach that gradually reduces the real-time flow’s available bandwidth using age-aware exponential weight updates, while classifying flows by buffer occupancy with hysteresis to preserve fairness. The core contributions include (i) age-aware exponential bandwidth reallocation with per-flow EWMA weighting, (ii) occupancy-based, hysteresis-driven flow classification into a small number of queues, and (iii) a kernel-level qdisc implementation with theoretical bounds showing bounded RT stall and bounded Web-flow degradation. Extensive NS-3 and test-bed evaluations demonstrate substantial stall reductions (often >60%) and near-parity Web performance without requiring end-host labels, suggesting practical deployment viability for last-mile routers.

Abstract

Real-time communication applications require consistently low latency, which is often disrupted by latency spikes caused by competing flows, especially Web traffic. We identify the root cause of disruptions in such cases as the mismatch between the abrupt bandwidth allocation adjustment of queue scheduling and gradual congestion window adjustment of congestion control. For example, when a sudden burst of new Web flows arrives, queue schedulers abruptly shift bandwidth away from the existing real-time flow(s). The real-time flow will need several RTTs to converge to the new available bandwidth, during which severe stalls occur. In this paper, we present Confucius, a practical queue management scheme designed for offering real-time traffic with consistently low latency regardless of competing flows. Confucius slows down bandwidth adjustment to match the reaction of congestion control, such that the end host can reduce the sending rate without incurring latency spikes. Importantly, Confucius does not require the collaboration of end-hosts (e.g., labels on packets), nor manual parameter tuning to achieve good performance. Extensive experiments show that Confucius outperforms existing practical queueing schemes by reducing the stall duration by more than 50%, while the competing flows also fairly enjoy on-par performance.
Paper Structure (40 sections, 33 equations, 29 figures, 4 tables)

This paper contains 40 sections, 33 equations, 29 figures, 4 tables.

Figures (29)

  • Figure 1: The scenario where the real-time flow is affected by competing flows. When Web flows join the competition with the real-time flow, the available bandwidth of the real-time flow will be immediately reduced. Note that even loading one Web page can have tens of concurrent active flows.
  • Figure 2: An existing real-time flow competes with flows of loading the homepage ofhttps://amazon.com, as shown in \ref{['fig:intro-example']}. The real-time flow, using GCC ton2017webrtc, always experiences transient stalls during the competition unless flows are pre-labeled by the end host and differentiated by the router.
  • Figure 3: Number of concurrent flows recorded by NetLognetlog. OPEN and IN_USE are socket states marked by Chrome, and ACTIVE means that the flow is receiving bytes in the last 10 ms.
  • Figure 4: Illustration of how bandwidth shares change over time with incoming Web flows and the existing real-time (RT) flow for different schedulers. The dashed red line marks the fair share.
  • Figure 5: When new competing flows join, the service rate of the real-time flow will be immediately reduced, but the CCA takes multiple RTTs to converge.
  • ...and 24 more figures