Table of Contents
Fetching ...

VECA: Reliable and Confidential Resource Clustering for Volunteer Edge-Cloud Computing

Hemanth Sai Yeddulapalli, Mauro Lemus Alarcon, Upasana Roy, Roshan Lal Neupane, Durbek Gafurov, Motahare Mounesan, Saptarshi Debroy, Prasad Calyam

TL;DR

VECA addresses reliability and confidentiality in Volunteer Edge-Cloud computing for ML/DL workflows under volatile resource availability. It integrates capacity-based clustering via $k$-means, a two-phase globally distributed scheduler with time-series forecasting from Recurrent Neural Networks, and confidential computing within a trusted execution environment. Evaluation in a Function-as-a-Service emulation with OpenFaaS and MicroK8s shows VECA reduces VEC node search latency by about 2x and improves productivity after failures by over 20%, outperforming VECFlex and VELA. The work enables scalable, private execution of ML workflows on heterogeneous edge resources and points to federated approaches to further tailor cluster capacities to diverse scientific workloads.

Abstract

Volunteer Edge-Cloud (VEC) computing has a significant potential to support scientific workflows in user communities contributing volunteer edge nodes. However, managing heterogeneous and intermittent resources to support machine/deep learning (ML/DL) based workflows poses challenges in resource governance for reliability, and confidentiality for model/data privacy protection. There is a need for approaches to handle the volatility of volunteer edge node availability, and also to scale the confidential data-intensive workflow execution across a large number of VEC nodes. In this paper, we present VECA, a reliable and confidential VEC resource clustering solution featuring three-fold methods tailored for executing ML/DL-based scientific workflows on VEC resources. Firstly, a capacity-based clustering approach enhances system reliability and minimizes VEC node search latency. Secondly, a novel two-phase, globally distributed scheduling scheme optimizes job allocation based on node attributes and using time-series-based Recurrent Neural Networks. Lastly, the integration of confidential computing ensures privacy preservation of the scientific workflows, where model and data information are not shared with VEC resources providers. We evaluate VECA in a Function-as-a-Service (FaaS) cloud testbed that features OpenFaaS and MicroK8S to support two ML/DL-based scientific workflows viz., G2P-Deep (bioinformatics) and PAS-ML (health informatics). Results from tested experiments demonstrate that our proposed VECA approach outperforms state-of-the-art methods; especially VECA exhibits a two-fold reduction in VEC node search latency and over 20% improvement in productivity rates following execution failures compared to the next best method.

VECA: Reliable and Confidential Resource Clustering for Volunteer Edge-Cloud Computing

TL;DR

VECA addresses reliability and confidentiality in Volunteer Edge-Cloud computing for ML/DL workflows under volatile resource availability. It integrates capacity-based clustering via -means, a two-phase globally distributed scheduler with time-series forecasting from Recurrent Neural Networks, and confidential computing within a trusted execution environment. Evaluation in a Function-as-a-Service emulation with OpenFaaS and MicroK8s shows VECA reduces VEC node search latency by about 2x and improves productivity after failures by over 20%, outperforming VECFlex and VELA. The work enables scalable, private execution of ML workflows on heterogeneous edge resources and points to federated approaches to further tailor cluster capacities to diverse scientific workloads.

Abstract

Volunteer Edge-Cloud (VEC) computing has a significant potential to support scientific workflows in user communities contributing volunteer edge nodes. However, managing heterogeneous and intermittent resources to support machine/deep learning (ML/DL) based workflows poses challenges in resource governance for reliability, and confidentiality for model/data privacy protection. There is a need for approaches to handle the volatility of volunteer edge node availability, and also to scale the confidential data-intensive workflow execution across a large number of VEC nodes. In this paper, we present VECA, a reliable and confidential VEC resource clustering solution featuring three-fold methods tailored for executing ML/DL-based scientific workflows on VEC resources. Firstly, a capacity-based clustering approach enhances system reliability and minimizes VEC node search latency. Secondly, a novel two-phase, globally distributed scheduling scheme optimizes job allocation based on node attributes and using time-series-based Recurrent Neural Networks. Lastly, the integration of confidential computing ensures privacy preservation of the scientific workflows, where model and data information are not shared with VEC resources providers. We evaluate VECA in a Function-as-a-Service (FaaS) cloud testbed that features OpenFaaS and MicroK8S to support two ML/DL-based scientific workflows viz., G2P-Deep (bioinformatics) and PAS-ML (health informatics). Results from tested experiments demonstrate that our proposed VECA approach outperforms state-of-the-art methods; especially VECA exhibits a two-fold reduction in VEC node search latency and over 20% improvement in productivity rates following execution failures compared to the next best method.
Paper Structure (33 sections, 10 equations, 6 figures, 2 algorithms)

This paper contains 33 sections, 10 equations, 6 figures, 2 algorithms.

Figures (6)

  • Figure 1: The VECA solution architecture illustrates users submitting ML/DL-based workflows to a Cloud Hub. Here, volunteer resources are clustered using the $k$-means algorithm and secured through a confidential computing framework. Two-phase distributed scheduling mechanism selects the most suitable cluster and the optimal VEC node within the selected cluster to execute the submitted workflow and meet user performance and security requirements.
  • Figure 2: Elbow Plot to determine optimal number of $k$ clusters.
  • Figure 3: Pipeline for the two-phase scheduler.
  • Figure 4: Results on VEC node search latency across 50 workflow instances.
  • Figure 5: Performance of the different approaches over a varying number of workflow instances.
  • ...and 1 more figures