RIFO: Pushing the Efficiency of Programmable Packet Schedulers
Habib Mostafaei, Maciej Pacut, Stefan Schmid
TL;DR
The paper tackles the resource intensity of Push-In-First-Out (PIFO) programmable schedulers by proposing Range-In First-Out (RIFO), a lightweight admission-based scheduler that uses only three registers and a single FIFO. RIFO relies on min–max normalization to compute a relative score $N(r_p) = \frac{r_p - Min}{Max - Min}$ and admits packets when this score exceeds queue utilization or when they fall within a guaranteed admission buffer, enabling effective policy realization with minimal state. Through large-scale NetBench simulations and a 650-line P4 Tofino prototype, RIFO delivers competitive flow completion times and substantial improvements for large flows (up to $4.91\times$) while achieving significant hardware efficiency (e.g., $2.54\times$ less SRAM than AIFO, $6.55\times$ less than SP-PIFO). The work demonstrates practical line-rate deployment, robust performance across workloads, and open-source artifacts to support reproducibility and further research in memory-efficient programmable schedulers.
Abstract
Packet scheduling is a fundamental networking task that recently received renewed attention in the context of programmable data planes. Programmable packet scheduling systems such as those based on Push-In First-Out (PIFO) abstraction enabled flexible scheduling policies, but are too resource-expensive for large-scale line rate operation. This prompted research into practical programmable schedulers (e.g., SP-PIFO, AIFO) approximating PIFO behavior on regular hardware. Yet, their scalability remains limited due to extensive number of memory operations. To address this, we design an effective yet resource-efficient packet scheduler, Range-In First-Out (RIFO), which uses only three mutable memory cells and one FIFO queue per PIFO queue. RIFO is based on multi-criteria decision-making principles and uses small guaranteed admission buffers. Our large-scale simulations in Netbench demonstrate that despite using fewer resources, RIFO generally achieves competitive flow completion times across all studied workloads, and is especially effective in workloads with a significant share of large flows, reducing flow completion time up to 4.91x in datamining workload compared to state-of-the-art solutions. Our prototype implementation using P4 on Tofino switches requires only 650 lines of code, is scalable, and runs at line rate.
