Table of Contents
Fetching ...

MELODY: Robust Semi-Supervised Hybrid Model for Entity-Level Online Anomaly Detection with Multivariate Time Series

Jingchao Ni, Gauthier Guinet, Peihong Jiang, Laurent Callot, Andrey Kan

TL;DR

MELODY tackles entity-level online anomaly detection in cloud deployments where anomalies manifest across heterogeneous multivariate time series. It introduces Online Feature Extraction to map entity MTS into a shared feature space and SemiAD, a hybrid detector combining a semi-supervised deep one-class model (SemiDOC) with a supervised LightGBM classifier. The framework uses two combination strategies, MELODY-M and MELODY-S, to balance false positives and detection performance. Experiments on a large real-world Amazon deployment dataset show substantial improvements in F1-score over state-of-the-art methods and confirm practical benefits for auto-rollback systems.

Abstract

In large IT systems, software deployment is a crucial process in online services as their code is regularly updated. However, a faulty code change may degrade the target service's performance and cause cascading outages in downstream services. Thus, software deployments should be comprehensively monitored, and their anomalies should be detected timely. In this paper, we study the problem of anomaly detection for deployments. We begin by identifying the challenges unique to this anomaly detection problem, which is at entity-level (e.g., deployments), relative to the more typical problem of anomaly detection in multivariate time series (MTS). The unique challenges include the heterogeneity of deployments, the low latency tolerance, the ambiguous anomaly definition, and the limited supervision. To address them, we propose a novel framework, semi-supervised hybrid Model for Entity-Level Online Detection of anomalY (MELODY). MELODY first transforms the MTS of different entities to the same feature space by an online feature extractor, then uses a newly proposed semi-supervised deep one-class model for detecting anomalous entities. We evaluated MELODY on real data of cloud services with 1.2M+ time series. The relative F1 score improvement of MELODY over the state-of-the-art methods ranges from 7.6% to 56.5%. The user evaluation suggests MELODY is suitable for monitoring deployments in large online systems.

MELODY: Robust Semi-Supervised Hybrid Model for Entity-Level Online Anomaly Detection with Multivariate Time Series

TL;DR

MELODY tackles entity-level online anomaly detection in cloud deployments where anomalies manifest across heterogeneous multivariate time series. It introduces Online Feature Extraction to map entity MTS into a shared feature space and SemiAD, a hybrid detector combining a semi-supervised deep one-class model (SemiDOC) with a supervised LightGBM classifier. The framework uses two combination strategies, MELODY-M and MELODY-S, to balance false positives and detection performance. Experiments on a large real-world Amazon deployment dataset show substantial improvements in F1-score over state-of-the-art methods and confirm practical benefits for auto-rollback systems.

Abstract

In large IT systems, software deployment is a crucial process in online services as their code is regularly updated. However, a faulty code change may degrade the target service's performance and cause cascading outages in downstream services. Thus, software deployments should be comprehensively monitored, and their anomalies should be detected timely. In this paper, we study the problem of anomaly detection for deployments. We begin by identifying the challenges unique to this anomaly detection problem, which is at entity-level (e.g., deployments), relative to the more typical problem of anomaly detection in multivariate time series (MTS). The unique challenges include the heterogeneity of deployments, the low latency tolerance, the ambiguous anomaly definition, and the limited supervision. To address them, we propose a novel framework, semi-supervised hybrid Model for Entity-Level Online Detection of anomalY (MELODY). MELODY first transforms the MTS of different entities to the same feature space by an online feature extractor, then uses a newly proposed semi-supervised deep one-class model for detecting anomalous entities. We evaluated MELODY on real data of cloud services with 1.2M+ time series. The relative F1 score improvement of MELODY over the state-of-the-art methods ranges from 7.6% to 56.5%. The user evaluation suggests MELODY is suitable for monitoring deployments in large online systems.
Paper Structure (31 sections, 7 equations, 4 figures, 4 tables)

This paper contains 31 sections, 7 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: An illustration of (a) point-level anomaly detection, and (b) entity-level anomaly detection.
  • Figure 2: An illustration of (a) the system architecture, and (b) the inference process of the MELODY framework.
  • Figure 3: The performance of MELODY-S w.r.t. (a) the ensemble size, and (b) the margin $\delta$ in Eq. \ref{['eq.hinge']}.
  • Figure 4: The tSNE visualization of the embeddings of SemiDOC using Hard Labels on (a) the train set, (b) the test set.