Table of Contents
Fetching ...

EPOCH: Enabling Preemption Operation for Context Saving in Heterogeneous FPGA Systems

Arsalan Ali Malik, Emre Karabulut, Aydin Aysu

TL;DR

EPOCH introduces a PCAP-based framework to enable preemption in multi-tenant cloud FPGAs by capturing and restoring fine-grained task state off-chip, allowing interruption at arbitrary clock cycles without restarting from scratch. It leverages frame-level bitstream readback, FAR addressing, and logic-location data to preserve LUTs, FFs, BRAM, and DSP state, while coordinating all components on a common PS clock and using GSR/RAR to ensure safe reinitialization. The authors claim zero-area overhead on the reconfigurable fabric, vendor-independent state capture, and a first generic benchmarking path, validated on Xilinx Zynq SoCs with per-frame save/restore times demonstrated on real hardware (e.g., two-slot PS clock scenario). Complex benchmarks with a CV32E40X core show the method scales to heterogeneous designs, though per-frame readback introduces millisecond-scale context times at 50 MHz PCAP, with potential improvements at higher PCAP speeds. Overall, EPOCH lays a practical foundation for OS-level preemption in cloud FPGA environments and highlights avenues for broader benchmarking, migration to UltraScale, and faster configuration interfaces.

Abstract

FPGAs are increasingly used in multi-tenant cloud environments to offload compute-intensive tasks from the main CPU. The operating system (OS) plays a vital role in identifying tasks suitable for offloading and coordinating between the CPU and FPGA for seamless task execution. The OS leverages preemption to manage CPU efficiently and balance CPU time; however, preempting tasks running on FPGAs without context loss remains challenging. Despite growing reliance on FPGAs, vendors have yet to deliver a solution that fully preserves and restores task context. This paper presents EPOCH, the first out-of-the-box framework to seamlessly preserve the state of tasks running on multi-tenant cloud FPGAs. EPOCH enables interrupting a tenant's execution at any arbitrary clock cycle, capturing its state, and saving this 'state snapshot' in off-chip memory with fine-grain granularity. Subsequently, when task resumption is required, EPOCH can resume execution from the saved 'state snapshot', eliminating the need to restart the task from scratch. EPOCH automates intricate processes, shields users from complexities, and synchronizes all underlying logic in a common clock domain, mitigating timing violations and ensuring seamless handling of interruptions. EPOCH proficiently captures the state of fundamental FPGA elements, such as look-up tables, flip-flops, block--RAMs, and digital signal processing units. On real hardware, ZynQ-XC7Z020 SoC, the proposed solution achieves context save and restore operations per frame in 62.2us and 67.4us, respectively.

EPOCH: Enabling Preemption Operation for Context Saving in Heterogeneous FPGA Systems

TL;DR

EPOCH introduces a PCAP-based framework to enable preemption in multi-tenant cloud FPGAs by capturing and restoring fine-grained task state off-chip, allowing interruption at arbitrary clock cycles without restarting from scratch. It leverages frame-level bitstream readback, FAR addressing, and logic-location data to preserve LUTs, FFs, BRAM, and DSP state, while coordinating all components on a common PS clock and using GSR/RAR to ensure safe reinitialization. The authors claim zero-area overhead on the reconfigurable fabric, vendor-independent state capture, and a first generic benchmarking path, validated on Xilinx Zynq SoCs with per-frame save/restore times demonstrated on real hardware (e.g., two-slot PS clock scenario). Complex benchmarks with a CV32E40X core show the method scales to heterogeneous designs, though per-frame readback introduces millisecond-scale context times at 50 MHz PCAP, with potential improvements at higher PCAP speeds. Overall, EPOCH lays a practical foundation for OS-level preemption in cloud FPGA environments and highlights avenues for broader benchmarking, migration to UltraScale, and faster configuration interfaces.

Abstract

FPGAs are increasingly used in multi-tenant cloud environments to offload compute-intensive tasks from the main CPU. The operating system (OS) plays a vital role in identifying tasks suitable for offloading and coordinating between the CPU and FPGA for seamless task execution. The OS leverages preemption to manage CPU efficiently and balance CPU time; however, preempting tasks running on FPGAs without context loss remains challenging. Despite growing reliance on FPGAs, vendors have yet to deliver a solution that fully preserves and restores task context. This paper presents EPOCH, the first out-of-the-box framework to seamlessly preserve the state of tasks running on multi-tenant cloud FPGAs. EPOCH enables interrupting a tenant's execution at any arbitrary clock cycle, capturing its state, and saving this 'state snapshot' in off-chip memory with fine-grain granularity. Subsequently, when task resumption is required, EPOCH can resume execution from the saved 'state snapshot', eliminating the need to restart the task from scratch. EPOCH automates intricate processes, shields users from complexities, and synchronizes all underlying logic in a common clock domain, mitigating timing violations and ensuring seamless handling of interruptions. EPOCH proficiently captures the state of fundamental FPGA elements, such as look-up tables, flip-flops, block--RAMs, and digital signal processing units. On real hardware, ZynQ-XC7Z020 SoC, the proposed solution achieves context save and restore operations per frame in 62.2us and 67.4us, respectively.

Paper Structure

This paper contains 25 sections, 1 equation, 5 figures, 3 tables.

Figures (5)

  • Figure 1: An example illustrating preemption support in a multi-tenant FPGA environment. Three tenants share two FPGA slots spatially. At 't3', a higher-priority tenant arrives, prompting the pausing of Tenant-2's work and the preservation of its context. Slot-2 is then allocated to Tenant-3 until completion at 't5'. At 't6', the preserved context of Tenant-2 is restored, enabling a seamless resumption of its execution without any loss of context.
  • Figure 2: The configuration logic block (CLB) of Xilinx $7$-Series FPGA serves as the primary mapping space for user logic. The CLB comprises two slices, Slice-L and Slice-M, each containing four $6$-input look-up tables, eight storage elements (flip-flops/latches), and a carry chain logic. Slice-M can also be utilized as a $32$-bit shift register or to store data via distributed RAM.
  • Figure 3: Block diagram of EPOCH on the Xilinx Zynq SoC with EPOCH executing on the processing system (PS) and a test/benchmark intellectual property (IP) running on the programmable logic (PL) side.
  • Figure 6: The two FPGA layouts employed in our experiments for (a) basic and (b) complex benchmarks. The basic benchmark simulates distinct tenants in a multi-cloud environment and consists of two slots. An example demonstrates the logic of a 4-bit up-counter mapped in Slot-1, while Slot-2 contains the logic mapping of a 4-bit down-counter. In contrast, the complex benchmarks comprise a single slot spanning two clock regions (X$0$Y$0$ and X$1$Y$0$), encompassing heterogeneous resources, such as LUTs, FFs, BRAMs, and DSP units.
  • Figure 7: The operational workflow of EPOCH consists of four steps. (1) The RISC-V toolchain converts C-based code into a memory initialization file. (2) Using Xilinx Vivado, the synthesis process combines this initialization file with RISC-V RTL code for CV32E40X, resulting in equivalent full-and-partial design bitstreams. This step is iterated with the per-frame CRC setting enabled to generate the FAR address of each logic element used in a design. (3) The EPOCH preemption code runs on the PS side, written in C, using the Xilinx SDK/Vitis platform. (4) The partial binary files are stored on the SD card, and the FPGA is programmed with the full design bitstream and the EPOCH preemption code. FPGA and PC communication is established using USB-to-serial for IP status reporting and monitoring.