THEMIS: Time, Heterogeneity, and Energy Minded Scheduling for Fair Multi-Tenant Use in FPGAs
Emre Karabulut, Arsalan Ali Malik, Amro Awad, Aydin Aysu
TL;DR
The paper tackles fair multi-tenant scheduling in cloud FPGAs, where heterogeneity and PR overhead complicate equitable resource sharing. It introduces THEMIS, a latency-aware, energy-aware, and heterogeneity-aware spatiotemporal scheduling policy that incorporates a new workload metric W_i = A_i × CT_i and a flexible average allocation AA_{Proposed} = (A × CT × HMTA) / Total Execution Time, with HMTA_i defined via the workload LCM; interval length is allowed to vary to tune energy vs. fairness. The approach accounts for non-identical PR regions, reduces unnecessary PR operations, and supports recurring and random tenant demands, achieving fairness improvements of 24.2%–98.4% and substantial energy–fairness trade-offs on a Xilinx Zedboard with MachSuite benchmarks. The authors demonstrate reduced slot idle time, up to 52.7% energy savings, and provide open-source code under MIT License for cloud providers to implement fair, energy-conscious FPGA multi-tenancy in real-world settings. Overall, THEMIS offers a practical framework for balancing fairness, energy efficiency, and dynamic workloads in heterogeneous FPGA environments, informing future cloud scheduling strategies and deployments.
Abstract
Using correct design metrics and understanding the limitations of the underlying technology is critical to developing effective scheduling algorithms. Unfortunately, existing scheduling techniques used \emph{incorrect} metrics and had \emph{unrealistic} assumptions for fair scheduling of multi-tenant FPGAs where each tenant is aimed to share approximately the same number of resources both spatially and temporally. This paper introduces an enhanced fair scheduling algorithm for multi-tenant FPGA use, addressing previous metric and assumption issues, with three specific improvements claimed First, our method ensures spatiotemporal fairness by considering both spatial and temporal aspects, addressing the limitation of prior work that assumed uniform task latency. Second, we incorporate energy considerations into fairness by adjusting scheduling intervals and accounting for energy overhead, thereby balancing energy efficiency with fairness. Third, we acknowledge overlooked aspects of FPGA multi-tenancy, including heterogeneous regions and the constraints on dynamically merging/splitting partially reconfigurable regions. We develop and evaluate our improved fair scheduling algorithm with these three enhancements. Inspired by the Greek goddess of law and personification of justice, we name our fair scheduling solution THEMIS: \underline{T}ime, \underline{H}eterogeneity, and \underline{E}nergy \underline{Mi}nded \underline{S}cheduling. We used the Xilinx Zedboard XC7Z020 to quantify our approach's savings. Compared to previous algorithms, our improved scheduling algorithm enhances fairness between 24.2--98.4\% and allows a trade-off between 55.3$\times$ in energy vs. 69.3$\times$ in fairness. The paper thus informs cloud providers about future scheduling optimizations for fairness with related challenges and opportunities.
