Table of Contents
Fetching ...

CAShift: Benchmarking Log-Based Cloud Attack Detection under Normality Shift

Jiongchi Yu, Xiaofei Xie, Qiang Hu, Bowen Zhang, Ziming Zhao, Yun Lin, Lei Ma, Ruitao Feng, Frank Liauw

TL;DR

This work introduces CAShift, a shift-aware cloud attack dataset designed to evaluate log-based anomaly detection (LAD) under normality shift across cloud components and attack surfaces. It benchmarks a suite of LAD methods, including semantic-aware AE/VAE and prediction-based models, against three cloud-specific normality shifts and 20 attack scenarios, revealing that all LAD approaches degrade under shift (up to 34% in F1) while continuous learning can mitigate this degradation with method- and budget-dependent gains (up to ~27% for VAE). The study also demonstrates that shift types, especially cloud-architecture shifts, pose unique challenges and can cause significant false positives for some models. The findings highlight the need for robust, multi-modal, and carefully tuned shift-adaptation strategies to enable reliable LAD in dynamic cloud environments and provide guidance for future research directions in LAD shift robustness.

Abstract

With the rapid advancement of cloud-native computing, securing cloud environments has become an important task. Log-based Anomaly Detection (LAD) is the most representative technique used in different systems for attack detection and safety guarantee, where multiple LAD methods and relevant datasets have been proposed. However, even though some of these datasets are specifically prepared for cloud systems, they only cover limited cloud behaviors and lack information from a whole-system perspective. Another critical issue to consider is normality shift, which implies that the test distribution could differ from the training distribution and highly affect the performance of LAD. Unfortunately, existing works only focus on simple shift types such as chronological changes, while other cloud-specific shift types are ignored. Therefore, a dataset that captures diverse cloud system behaviors and various types of normality shifts is essential. To fill this gap, we construct a dataset CAShift to evaluate the performance of LAD in cloud, which considers different roles of software in cloud systems, supports three real-world normality shift types and features 20 different attack scenarios in various cloud system components. Based on CAShift, we evaluate the effectiveness of existing LAD methods in normality shift scenarios. Additionally, to explore the feasibility of shift adaptation, we further investigate three continuous learning approaches to mitigate the impact of distribution shift. Results demonstrated that 1) all LAD methods suffer from normality shift where the performance drops up to 34%, and 2) existing continuous learning methods are promising to address shift drawbacks, but the configurations highly affect the shift adaptation. Based on our findings, we offer valuable implications for future research in designing more robust LAD models and methods for LAD shift adaptation.

CAShift: Benchmarking Log-Based Cloud Attack Detection under Normality Shift

TL;DR

This work introduces CAShift, a shift-aware cloud attack dataset designed to evaluate log-based anomaly detection (LAD) under normality shift across cloud components and attack surfaces. It benchmarks a suite of LAD methods, including semantic-aware AE/VAE and prediction-based models, against three cloud-specific normality shifts and 20 attack scenarios, revealing that all LAD approaches degrade under shift (up to 34% in F1) while continuous learning can mitigate this degradation with method- and budget-dependent gains (up to ~27% for VAE). The study also demonstrates that shift types, especially cloud-architecture shifts, pose unique challenges and can cause significant false positives for some models. The findings highlight the need for robust, multi-modal, and carefully tuned shift-adaptation strategies to enable reliable LAD in dynamic cloud environments and provide guidance for future research directions in LAD shift robustness.

Abstract

With the rapid advancement of cloud-native computing, securing cloud environments has become an important task. Log-based Anomaly Detection (LAD) is the most representative technique used in different systems for attack detection and safety guarantee, where multiple LAD methods and relevant datasets have been proposed. However, even though some of these datasets are specifically prepared for cloud systems, they only cover limited cloud behaviors and lack information from a whole-system perspective. Another critical issue to consider is normality shift, which implies that the test distribution could differ from the training distribution and highly affect the performance of LAD. Unfortunately, existing works only focus on simple shift types such as chronological changes, while other cloud-specific shift types are ignored. Therefore, a dataset that captures diverse cloud system behaviors and various types of normality shifts is essential. To fill this gap, we construct a dataset CAShift to evaluate the performance of LAD in cloud, which considers different roles of software in cloud systems, supports three real-world normality shift types and features 20 different attack scenarios in various cloud system components. Based on CAShift, we evaluate the effectiveness of existing LAD methods in normality shift scenarios. Additionally, to explore the feasibility of shift adaptation, we further investigate three continuous learning approaches to mitigate the impact of distribution shift. Results demonstrated that 1) all LAD methods suffer from normality shift where the performance drops up to 34%, and 2) existing continuous learning methods are promising to address shift drawbacks, but the configurations highly affect the shift adaptation. Based on our findings, we offer valuable implications for future research in designing more robust LAD models and methods for LAD shift adaptation.

Paper Structure

This paper contains 24 sections, 2 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Common attack surfaces in cloud systems.
  • Figure 2: Overview of our benchmarking framework.
  • Figure 3: T-SNE visualization of shift logs compared to attack logs and normal logs.
  • Figure 4: The difference of the frequency (%) of top 10 system call names between shift distributions and the base distribution.
  • Figure 5: F1-Scores achieved by each LAD method under normality shift.
  • ...and 1 more figures