Table of Contents
Fetching ...

Efficient Training Approaches for Performance Anomaly Detection Models in Edge Computing Environments

Duneesha Fernando, Maria A. Rodriguez, Patricia Arroba, Leila Ismail, Rajkumar Buyya

TL;DR

This work addresses performance anomaly detection in resource-constrained edge environments by bridging the gap between training-efficient generic models and highly accurate per-device models. It introduces two clustering-based training approaches, ICPTL and CM, built on similarity-based clustering of edge devices with similar normal data distributions. ICPTL preserves MPD-like accuracy with significantly reduced training cycles, while CM further improves efficiency by training far fewer cluster-level models, often outperforming a generic model. Evaluations on the Server Machine Dataset (SMD) and an emulated edge setup show that AE is a robust anomaly detector, and the proposed clustering-based methods achieve favorable accuracy-efficiency trade-offs, enabling scalable deployment across large, heterogeneous edge fleets.

Abstract

Microservice architectures are increasingly used to modularize IoT applications and deploy them in distributed and heterogeneous edge computing environments. Over time, these microservice-based IoT applications are susceptible to performance anomalies caused by resource hogging (e.g., CPU or memory), resource contention, etc., which can negatively impact their Quality of Service and violate their Service Level Agreements. Existing research on performance anomaly detection for edge computing environments focuses on model training approaches that either achieve high accuracy at the expense of a time-consuming and resource-intensive training process or prioritize training efficiency at the cost of lower accuracy. To address this gap, while considering the resource constraints and the large number of devices in modern edge platforms, we propose two clustering-based model training approaches : (1) intra-cluster parameter transfer learning-based model training (ICPTL) and (2) cluster-level model training (CM). These approaches aim to find a trade-off between the training efficiency of anomaly detection models and their accuracy. We compared the models trained under ICPTL and CM to models trained for specific devices (most accurate, least efficient) and a single general model trained for all devices (least accurate, most efficient). Our findings show that the model accuracy of ICPTL is comparable to that of the model per device approach while requiring only 40% of the training time. In addition, CM further improves training efficiency by requiring 23% less training time and reducing the number of trained models by approximately 66% compared to ICPTL, yet achieving a higher accuracy than a single general model.

Efficient Training Approaches for Performance Anomaly Detection Models in Edge Computing Environments

TL;DR

This work addresses performance anomaly detection in resource-constrained edge environments by bridging the gap between training-efficient generic models and highly accurate per-device models. It introduces two clustering-based training approaches, ICPTL and CM, built on similarity-based clustering of edge devices with similar normal data distributions. ICPTL preserves MPD-like accuracy with significantly reduced training cycles, while CM further improves efficiency by training far fewer cluster-level models, often outperforming a generic model. Evaluations on the Server Machine Dataset (SMD) and an emulated edge setup show that AE is a robust anomaly detector, and the proposed clustering-based methods achieve favorable accuracy-efficiency trade-offs, enabling scalable deployment across large, heterogeneous edge fleets.

Abstract

Microservice architectures are increasingly used to modularize IoT applications and deploy them in distributed and heterogeneous edge computing environments. Over time, these microservice-based IoT applications are susceptible to performance anomalies caused by resource hogging (e.g., CPU or memory), resource contention, etc., which can negatively impact their Quality of Service and violate their Service Level Agreements. Existing research on performance anomaly detection for edge computing environments focuses on model training approaches that either achieve high accuracy at the expense of a time-consuming and resource-intensive training process or prioritize training efficiency at the cost of lower accuracy. To address this gap, while considering the resource constraints and the large number of devices in modern edge platforms, we propose two clustering-based model training approaches : (1) intra-cluster parameter transfer learning-based model training (ICPTL) and (2) cluster-level model training (CM). These approaches aim to find a trade-off between the training efficiency of anomaly detection models and their accuracy. We compared the models trained under ICPTL and CM to models trained for specific devices (most accurate, least efficient) and a single general model trained for all devices (least accurate, most efficient). Our findings show that the model accuracy of ICPTL is comparable to that of the model per device approach while requiring only 40% of the training time. In addition, CM further improves training efficiency by requiring 23% less training time and reducing the number of trained models by approximately 66% compared to ICPTL, yet achieving a higher accuracy than a single general model.
Paper Structure (16 sections, 10 equations, 11 figures, 6 tables, 3 algorithms)

This paper contains 16 sections, 10 equations, 11 figures, 6 tables, 3 algorithms.

Figures (11)

  • Figure 1: Properties of applications placed in edge computing environments
  • Figure 2: Visual representation of the 2 clustering-based training approaches and the 2 baseline approaches
  • Figure 3: Positioning of the two proposed clustering-based training approaches
  • Figure 4: Variance of mean of metrics across SMD devices
  • Figure 5: Distribution of AUC and F1 score of anomaly detection algorithms across devices in SMD
  • ...and 6 more figures