Kugelblitz: Executable, Cost-Aware Design-Space Exploration for Programmable Packet Pipelines
Artem Ageev, Antoine Kaufmann
TL;DR
Kugelblitz tackles the challenge of designing programmable packet-processing pipelines by making feasibility a primary, executable constraint and coupling it with synthesis-backed cost estimation and full-system evaluation. The framework decouples programs from architectures, uses a SAT-based feasibility checker to prune infeasible designs, automatically generates synthesizable RTL for feasible configurations, and evaluates performance in cycle-accurate full-system simulations with real workloads. Its key contributions include explicit feasibility pruning, a unified flow from program to RTL to system simulation, and demonstrable end-to-end validation against hardware measurements. The approach yields accurate capability-cost-trade-off insights and reveals non-linear cost growth when supporting richer workloads, enabling principled design-space exploration for SmartNICs and switches.
Abstract
Programmable packet-processing pipelines are a core building block of modern SmartNICs and switches, yet their design requires navigating intertwined trade-offs among program feasibility, hardware cost, and system-level performance. Existing approaches rely on proxy metrics such as stage or ALU count, which often mispredict capability and end-to-end behavior. We present Kugelblitz, a framework for executable, cost-aware design-space exploration of programmable packet pipelines. Kugelblitz decouples packet-processing programs from pipeline architectures and uses compiler-based feasibility checking to prune designs that cannot support target workloads. For feasible architectures, Kugelblitz automatically generates synthesizable RTL, enabling synthesis-backed area and timing estimation and cycle-accurate full-system evaluation with real application workloads. Using representative programs including NAT, firewalling, and an in-network key-value cache, we show that proxy metrics substantially overestimate capability, that performance rankings change under system-level evaluation, and that the cost of supporting richer workloads is highly non-linear.
