Table of Contents
Fetching ...

FlexCross: High-Speed and Flexible Packet Processing via a Crosspoint-Queued Crossbar

Klajd Zyla, Marco Liess, Thomas Wild, Andreas Herkersdorf

TL;DR

The design contains a crosspoint-queued crossbar that enables the execution of complex applications by forwarding incoming packets to the required processing engines in the specified sequence, and demonstrates that FlexCross outperforms state-of-the-art flexible packet-processing designs for different traffic loads and scenarios.

Abstract

The fast pace at which new online services emerge leads to a rapid surge in the volume of network traffic. A recent approach that the research community has proposed to tackle this issue is in-network computing, which means that network devices perform more computations than before. As a result, processing demands become more varied, creating the need for flexible packet-processing architectures. State-of-the-art approaches provide a high degree of flexibility at the expense of performance for complex applications, or they ensure high performance but only for specific use cases. In order to address these limitations, we propose FlexCross. This flexible packet-processing design can process network traffic with diverse processing requirements at over 100 Gbit/s on FPGAs. Our design contains a crosspoint-queued crossbar that enables the execution of complex applications by forwarding incoming packets to the required processing engines in the specified sequence. The crossbar consists of distributed logic blocks that route incoming packets to the specified targets and resolve contentions for shared resources, as well as memory blocks for packet buffering. We implemented a prototype of FlexCross in Verilog and evaluated it via cycle-accurate register-transfer level simulations. We also conducted test runs with real-world network traffic on an FPGA. The evaluation results demonstrate that FlexCross outperforms state-of-the-art flexible packet-processing designs for different traffic loads and scenarios. The synthesis results show that our prototype consumes roughly 21% of the resources on a Virtex XCU55 UltraScale+ FPGA.

FlexCross: High-Speed and Flexible Packet Processing via a Crosspoint-Queued Crossbar

TL;DR

The design contains a crosspoint-queued crossbar that enables the execution of complex applications by forwarding incoming packets to the required processing engines in the specified sequence, and demonstrates that FlexCross outperforms state-of-the-art flexible packet-processing designs for different traffic loads and scenarios.

Abstract

The fast pace at which new online services emerge leads to a rapid surge in the volume of network traffic. A recent approach that the research community has proposed to tackle this issue is in-network computing, which means that network devices perform more computations than before. As a result, processing demands become more varied, creating the need for flexible packet-processing architectures. State-of-the-art approaches provide a high degree of flexibility at the expense of performance for complex applications, or they ensure high performance but only for specific use cases. In order to address these limitations, we propose FlexCross. This flexible packet-processing design can process network traffic with diverse processing requirements at over 100 Gbit/s on FPGAs. Our design contains a crosspoint-queued crossbar that enables the execution of complex applications by forwarding incoming packets to the required processing engines in the specified sequence. The crossbar consists of distributed logic blocks that route incoming packets to the specified targets and resolve contentions for shared resources, as well as memory blocks for packet buffering. We implemented a prototype of FlexCross in Verilog and evaluated it via cycle-accurate register-transfer level simulations. We also conducted test runs with real-world network traffic on an FPGA. The evaluation results demonstrate that FlexCross outperforms state-of-the-art flexible packet-processing designs for different traffic loads and scenarios. The synthesis results show that our prototype consumes roughly 21% of the resources on a Virtex XCU55 UltraScale+ FPGA.
Paper Structure (16 sections, 6 figures, 1 table)

This paper contains 16 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Block diagram of the architecture of FlexCross
  • Figure 2: Block diagram of the architecture of the Crossbar
  • Figure 3: Mean per-packet latency when receiving traffic associated with four flow types at different rates measured in PANIC, the CIOQ crossbar-based design, FlexPipe, and FlexCross with RR scheduling. The dashed lines show the minimum/maximum measured latency.
  • Figure 4: Throughput in % of the bandwidth when receiving traffic at different rates mapped to randomly generated task sequences achieved by PANIC, the CIOQ crossbar-based design, and FlexCross with three different schedulers
  • Figure 5: Mean per-packet latency when receiving traffic at different rates mapped to randomly generated task sequences measured in FlexCross with three different schedulers. The dashed lines show the minimum/maximum measured latency, while the orange indicates the mean processing delay.
  • ...and 1 more figures