Table of Contents
Fetching ...

System States Forecasting of Microservices with Dynamic Spatio-Temporal Data

Yifei Xu, Jingguo Ge, Haina Tang, Shuai Ding, Tong Li, Hui Li

TL;DR

STMformer addresses microservices state forecasting by integrating dynamic topology and network-connection information within a spatio-temporal Transformer framework, augmented with PatchCrossAttention to capture cascading effects. The model combines intra-host (IMM), spatial (SMM), and temporal (TMM) message modules, including TB for intrinsic Trends and PCA for global spatio-temporal attention, to forecast multi-node, multi-variate time series. Evaluations on a dedicated microservices dataset demonstrate superior short-term performance and competitive long-term results, with MAE reduced by 8.6% and MSE by 2.2% over strong baselines, validating the method’s effectiveness for AIOps in dynamic cloud environments. The work also introduces a code-free data collection pipeline and a scalable dataset, highlighting practical contributions for operational forecasting and cascading-effect understanding in microservice ecosystems.

Abstract

In the AIOps (Artificial Intelligence for IT Operations) era, accurately forecasting system states is crucial. In microservices systems, this task encounters the challenge of dynamic and complex spatio-temporal relationships among microservice instances, primarily due to dynamic deployments, diverse call paths, and cascading effects among instances. Current time-series forecasting methods, which focus mainly on intrinsic patterns, are insufficient in environments where spatial relationships are critical. Similarly, spatio-temporal graph approaches often neglect the nature of temporal trend, concentrating mostly on message passing between nodes. Moreover, current research in microservices domain frequently underestimates the importance of network metrics and topological structures in capturing the evolving dynamics of systems. This paper introduces STMformer, a model tailored for forecasting system states in microservices environments, capable of handling multi-node and multivariate time series. Our method leverages dynamic network connection data and topological information to assist in modeling the intricate spatio-temporal relationships within the system. Additionally, we integrate the PatchCrossAttention module to compute the impact of cascading effects globally. We have developed a dataset based on a microservices system and conducted comprehensive experiments with STMformer against leading methods. In both short-term and long-term forecasting tasks, our model consistently achieved a 8.6% reduction in MAE(Mean Absolute Error) and a 2.2% reduction in MSE (Mean Squared Error). The source code is available at https://github.com/xuyifeiiie/STMformer.

System States Forecasting of Microservices with Dynamic Spatio-Temporal Data

TL;DR

STMformer addresses microservices state forecasting by integrating dynamic topology and network-connection information within a spatio-temporal Transformer framework, augmented with PatchCrossAttention to capture cascading effects. The model combines intra-host (IMM), spatial (SMM), and temporal (TMM) message modules, including TB for intrinsic Trends and PCA for global spatio-temporal attention, to forecast multi-node, multi-variate time series. Evaluations on a dedicated microservices dataset demonstrate superior short-term performance and competitive long-term results, with MAE reduced by 8.6% and MSE by 2.2% over strong baselines, validating the method’s effectiveness for AIOps in dynamic cloud environments. The work also introduces a code-free data collection pipeline and a scalable dataset, highlighting practical contributions for operational forecasting and cascading-effect understanding in microservice ecosystems.

Abstract

In the AIOps (Artificial Intelligence for IT Operations) era, accurately forecasting system states is crucial. In microservices systems, this task encounters the challenge of dynamic and complex spatio-temporal relationships among microservice instances, primarily due to dynamic deployments, diverse call paths, and cascading effects among instances. Current time-series forecasting methods, which focus mainly on intrinsic patterns, are insufficient in environments where spatial relationships are critical. Similarly, spatio-temporal graph approaches often neglect the nature of temporal trend, concentrating mostly on message passing between nodes. Moreover, current research in microservices domain frequently underestimates the importance of network metrics and topological structures in capturing the evolving dynamics of systems. This paper introduces STMformer, a model tailored for forecasting system states in microservices environments, capable of handling multi-node and multivariate time series. Our method leverages dynamic network connection data and topological information to assist in modeling the intricate spatio-temporal relationships within the system. Additionally, we integrate the PatchCrossAttention module to compute the impact of cascading effects globally. We have developed a dataset based on a microservices system and conducted comprehensive experiments with STMformer against leading methods. In both short-term and long-term forecasting tasks, our model consistently achieved a 8.6% reduction in MAE(Mean Absolute Error) and a 2.2% reduction in MSE (Mean Squared Error). The source code is available at https://github.com/xuyifeiiie/STMformer.
Paper Structure (19 sections, 10 equations, 4 figures, 3 tables)

This paper contains 19 sections, 10 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: (a) shows multiple relationships in the microservices system. The blue circles represents the observation metrics of pod. (b) shows instance metrics of ts-seat-service and ts-order-service under two fault injections. First row shows network loss faults and second row shows CPU stress faults.
  • Figure 2: The framework of STMformer. The seasonal component from time series decomposition serves as input to both encoder and decoder. The encoder output is combined with the trend to integrate trend information. In the decoder, this combined output is fused with the original seasonal data to update and refine the state representation.
  • Figure 3: The procedure PatchCrossAttention of to compute global attention among all time steps and nodes. The upper chain in blue is to compute global attention by nodes. The lower chain in green is to compute global attention by edges.
  • Figure 4: Impact of hyperparameters on model performance. (a) evaluation based on different dimension of model. (b) evaluation based on the choice of kernel length for sequence decomposition. (c) evaluation based on the number of attention heads. (d) evaluation based on the number of attention heads. (e) evaluation based on the dimension of random feature mapping in TMM-PCA. (f) evaluation based on the number of sampling for gumble-softmax in TMM-PCA. (32 prediction steps)