Table of Contents
Fetching ...

Prepared for the Unknown: Adapting AIOps Capacity Forecasting Models to Data Changes

Lorena Poenaru-Olaru, Wouter van 't Hof, Adrian Stando, Arkadiusz P. Trawinski, Eileen Kapel, Jan S. Rellermeyer, Luis Cruz, Arie van Deursen

TL;DR

This work examines how capacity-forecasting AIOps models should adapt to data changes in a real-world ING setting. It compares drift-detection–driven retraining, using the FEDD detector, against periodic (monthly) retraining to assess impacts on accuracy and retraining cost for a two-week forecast horizon. Across 16 CPU/memory time series, drift-based retraining generally maintains $MASE$ performance while halving retraining frequency, though a notable exception (Machine 3) favors periodic retraining, highlighting the potential for a hybrid approach. The study offers practical guidance on scalable deployment of drift-aware forecasting, including design considerations and the need to handle missing data and abrupt changes in time series.

Abstract

Capacity management is critical for software organizations to allocate resources effectively and meet operational demands. An important step in capacity management is predicting future resource needs often relies on data-driven analytics and machine learning (ML) forecasting models, which require frequent retraining to stay relevant as data evolves. Continuously retraining the forecasting models can be expensive and difficult to scale, posing a challenge for engineering teams tasked with balancing accuracy and efficiency. Retraining only when the data changes appears to be a more computationally efficient alternative, but its impact on accuracy requires further investigation. In this work, we investigate the effects of retraining capacity forecasting models for time series based on detected changes in the data compared to periodic retraining. Our results show that drift-based retraining achieves comparable forecasting accuracy to periodic retraining in most cases, making it a cost-effective strategy. However, in cases where data is changing rapidly, periodic retraining is still preferred to maximize the forecasting accuracy. These findings offer actionable insights for software teams to enhance forecasting systems, reducing retraining overhead while maintaining robust performance.

Prepared for the Unknown: Adapting AIOps Capacity Forecasting Models to Data Changes

TL;DR

This work examines how capacity-forecasting AIOps models should adapt to data changes in a real-world ING setting. It compares drift-detection–driven retraining, using the FEDD detector, against periodic (monthly) retraining to assess impacts on accuracy and retraining cost for a two-week forecast horizon. Across 16 CPU/memory time series, drift-based retraining generally maintains performance while halving retraining frequency, though a notable exception (Machine 3) favors periodic retraining, highlighting the potential for a hybrid approach. The study offers practical guidance on scalable deployment of drift-aware forecasting, including design considerations and the need to handle missing data and abrupt changes in time series.

Abstract

Capacity management is critical for software organizations to allocate resources effectively and meet operational demands. An important step in capacity management is predicting future resource needs often relies on data-driven analytics and machine learning (ML) forecasting models, which require frequent retraining to stay relevant as data evolves. Continuously retraining the forecasting models can be expensive and difficult to scale, posing a challenge for engineering teams tasked with balancing accuracy and efficiency. Retraining only when the data changes appears to be a more computationally efficient alternative, but its impact on accuracy requires further investigation. In this work, we investigate the effects of retraining capacity forecasting models for time series based on detected changes in the data compared to periodic retraining. Our results show that drift-based retraining achieves comparable forecasting accuracy to periodic retraining in most cases, making it a cost-effective strategy. However, in cases where data is changing rapidly, periodic retraining is still preferred to maximize the forecasting accuracy. These findings offer actionable insights for software teams to enhance forecasting systems, reducing retraining overhead while maintaining robust performance.

Paper Structure

This paper contains 28 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Retraining the forecasting model based on drift detection.
  • Figure 2: Drift Detection Block components and functionality.
  • Figure 3: CPU Utilization
  • Figure 4: Memory Utilization
  • Figure 5: Time Series Corresponding to CPU utilization for Machine 1 (a) and Machine 3 (b) including the moments when the drift was detected by FEDD and the moments when the forecasting model was retrained.