EPOCH: Enabling Preemption Operation for Context Saving in Heterogeneous FPGA Systems
Arsalan Ali Malik, Emre Karabulut, Aydin Aysu
TL;DR
EPOCH introduces a PCAP-based framework to enable preemption in multi-tenant cloud FPGAs by capturing and restoring fine-grained task state off-chip, allowing interruption at arbitrary clock cycles without restarting from scratch. It leverages frame-level bitstream readback, FAR addressing, and logic-location data to preserve LUTs, FFs, BRAM, and DSP state, while coordinating all components on a common PS clock and using GSR/RAR to ensure safe reinitialization. The authors claim zero-area overhead on the reconfigurable fabric, vendor-independent state capture, and a first generic benchmarking path, validated on Xilinx Zynq SoCs with per-frame save/restore times demonstrated on real hardware (e.g., two-slot PS clock scenario). Complex benchmarks with a CV32E40X core show the method scales to heterogeneous designs, though per-frame readback introduces millisecond-scale context times at 50 MHz PCAP, with potential improvements at higher PCAP speeds. Overall, EPOCH lays a practical foundation for OS-level preemption in cloud FPGA environments and highlights avenues for broader benchmarking, migration to UltraScale, and faster configuration interfaces.
Abstract
FPGAs are increasingly used in multi-tenant cloud environments to offload compute-intensive tasks from the main CPU. The operating system (OS) plays a vital role in identifying tasks suitable for offloading and coordinating between the CPU and FPGA for seamless task execution. The OS leverages preemption to manage CPU efficiently and balance CPU time; however, preempting tasks running on FPGAs without context loss remains challenging. Despite growing reliance on FPGAs, vendors have yet to deliver a solution that fully preserves and restores task context. This paper presents EPOCH, the first out-of-the-box framework to seamlessly preserve the state of tasks running on multi-tenant cloud FPGAs. EPOCH enables interrupting a tenant's execution at any arbitrary clock cycle, capturing its state, and saving this 'state snapshot' in off-chip memory with fine-grain granularity. Subsequently, when task resumption is required, EPOCH can resume execution from the saved 'state snapshot', eliminating the need to restart the task from scratch. EPOCH automates intricate processes, shields users from complexities, and synchronizes all underlying logic in a common clock domain, mitigating timing violations and ensuring seamless handling of interruptions. EPOCH proficiently captures the state of fundamental FPGA elements, such as look-up tables, flip-flops, block--RAMs, and digital signal processing units. On real hardware, ZynQ-XC7Z020 SoC, the proposed solution achieves context save and restore operations per frame in 62.2us and 67.4us, respectively.
