Sharing GPUs and Programmable Switches in a Federated Testbed with SHARY
Stefano Salsano, Andrea Mayer, Paolo Lungaroni, Pierpaolo Loreti, Lorenzo Bracciale, Andrea Detti, Marco Orazi, Paolo Giaccone, Fulvio Risso, Alessandro Cornacchia, Carla Fabiana Chiasserini
TL;DR
The paper tackles the challenge of efficiently sharing scarce and expensive resources, notably GPUs and programmable switches, across federated testbeds. It introduces SHARY as a dynamic reservation platform with an adaptation layer, and couples it with FIGO for GPU orchestration and SUP4RNET for P4 switch reservations, forming an integrated resource-sharing ecosystem. The RESTART-based deployment in Italy demonstrates practical federation between Politecnico di Torino and University of Rome Tor Vergata, illustrating improved utilization and accessibility of heterogeneous resources. Collectively, the approach aims to reduce costs, accelerate AI and networking research, and provide a scalable model for federated resource management across diverse sites and resource types.
Abstract
Federated testbeds enable collaborative research by providing access to diverse resources, including computing power, storage, and specialized hardware like GPUs, programmable switches and smart Network Interface Cards (NICs). Efficiently sharing these resources across federated institutions is challenging, particularly when resources are scarce and costly. GPUs are crucial for AI and machine learning research, but their high demand and expense make efficient management essential. Similarly, advanced experimentation on programmable data plane requires very expensive programmable switches (e.g., based on P4) and smart NICs. This paper introduces SHARY (SHaring Any Resource made easY), a dynamic reservation system that simplifies resource booking and management in federated environments. We show that SHARY can be adopted for heterogenous resources, thanks to an adaptation layer tailored for the specific resource considered. Indeed, it can be integrated with FIGO (Federated Infrastructure for GPU Orchestration), which enhances GPU availability through a demand-driven sharing model. By enabling real-time resource sharing and a flexible booking system, FIGO improves access to GPUs, reduces costs, and accelerates research progress. SHARY can be also integrated with SUP4RNET platform to reserve the access of P4 switches.
