Offloading Artificial Intelligence Workloads across the Computing Continuum by means of Active Storage Systems
Alex Barceló, Sebastián A. Cajas Ordoñez, Jaydeep Samanta, Andrés L. Suárez-Cetrulo, Romila Ghosh, Ricardo Simón Carbajo, Anna Queralt
TL;DR
The paper addresses AI data management challenges across edge-to-cloud continua by proposing an architecture that embeds computation near data using active storage (dataClay). It demonstrates how offloading AI workloads via active objects can dramatically reduce client memory requirements while maintaining accuracy, and it provides an implementation and evaluation across constrained edge devices and more capable servers. Key contributions include a complete software architecture, a dataClay-based implementation, and extensive experiments showing memory savings, faster training times on server nodes, and favorable storage trade-offs; the work also explores scaling to distributed workloads in HPC environments. Overall, the approach offers a scalable, resource-efficient path for deploying distributed AI across heterogeneous devices with a low entry barrier for domain experts. The results suggest that active storage can significantly improve data locality and workload distribution in computing continuum scenarios, with practical implications for edge AI and federated-like workflows.
Abstract
The increasing demand for artificial intelligence (AI) workloads across diverse computing environments has driven the need for more efficient data management strategies. Traditional cloud-based architectures struggle to handle the sheer volume and velocity of AI-driven data, leading to inefficiencies in storage, computation, and data movement. This paper explores the integration of active storage systems within the computing continuum to optimize AI workload distribution. By embedding computation directly into storage architectures, active storage is able to reduce data transfer overhead, enhancing performance and improving resource utilization. Other existing frameworks and architectures offer mechanisms to distribute certain AI processes across distributed environments; however, they lack the flexibility and adaptability that the continuum requires, both regarding the heterogeneity of devices and the rapid-changing algorithms and models being used by domain experts and researchers. This article proposes a software architecture aimed at seamlessly distributing AI workloads across the computing continuum, and presents its implementation using mainstream Python libraries and dataClay, an active storage platform. The evaluation shows the benefits and trade-offs regarding memory consumption, storage requirements, training times, and execution efficiency across different devices. Experimental results demonstrate that the process of offloading workloads through active storage significantly improves memory efficiency and training speeds while maintaining accuracy. Our findings highlight the potential of active storage to revolutionize AI workload management, making distributed AI deployments more scalable and resource-efficient with a very low entry barrier for domain experts and application developers.
