VECA: Reliable and Confidential Resource Clustering for Volunteer Edge-Cloud Computing
Hemanth Sai Yeddulapalli, Mauro Lemus Alarcon, Upasana Roy, Roshan Lal Neupane, Durbek Gafurov, Motahare Mounesan, Saptarshi Debroy, Prasad Calyam
TL;DR
VECA addresses reliability and confidentiality in Volunteer Edge-Cloud computing for ML/DL workflows under volatile resource availability. It integrates capacity-based clustering via $k$-means, a two-phase globally distributed scheduler with time-series forecasting from Recurrent Neural Networks, and confidential computing within a trusted execution environment. Evaluation in a Function-as-a-Service emulation with OpenFaaS and MicroK8s shows VECA reduces VEC node search latency by about 2x and improves productivity after failures by over 20%, outperforming VECFlex and VELA. The work enables scalable, private execution of ML workflows on heterogeneous edge resources and points to federated approaches to further tailor cluster capacities to diverse scientific workloads.
Abstract
Volunteer Edge-Cloud (VEC) computing has a significant potential to support scientific workflows in user communities contributing volunteer edge nodes. However, managing heterogeneous and intermittent resources to support machine/deep learning (ML/DL) based workflows poses challenges in resource governance for reliability, and confidentiality for model/data privacy protection. There is a need for approaches to handle the volatility of volunteer edge node availability, and also to scale the confidential data-intensive workflow execution across a large number of VEC nodes. In this paper, we present VECA, a reliable and confidential VEC resource clustering solution featuring three-fold methods tailored for executing ML/DL-based scientific workflows on VEC resources. Firstly, a capacity-based clustering approach enhances system reliability and minimizes VEC node search latency. Secondly, a novel two-phase, globally distributed scheduling scheme optimizes job allocation based on node attributes and using time-series-based Recurrent Neural Networks. Lastly, the integration of confidential computing ensures privacy preservation of the scientific workflows, where model and data information are not shared with VEC resources providers. We evaluate VECA in a Function-as-a-Service (FaaS) cloud testbed that features OpenFaaS and MicroK8S to support two ML/DL-based scientific workflows viz., G2P-Deep (bioinformatics) and PAS-ML (health informatics). Results from tested experiments demonstrate that our proposed VECA approach outperforms state-of-the-art methods; especially VECA exhibits a two-fold reduction in VEC node search latency and over 20% improvement in productivity rates following execution failures compared to the next best method.
