On Software Ageing Indicators in OpenStack
Yevhen Yazvinskyi, Jasmin Bogatinovski, Jorge Cardoso, Odej Kao
TL;DR
This study addresses software ageing in distributed cloud systems by comparing memory-based indicators (including swap) with request response time in OpenStack. It extends the SWARE ageing framework with memory metrics and a post-rejuvenation phase, employing accelerated workloads across 12 scenarios (all-in-one and multi-node) to observe ageing dynamics using Mann-Kendall trend tests and Sen's slope. The results show that response time, especially when combined with workload frequency and error analysis, provides a robust signal of ageing, while memory indicators can misalign with actual ageing trends under certain configurations or high concurrency. The work highlights practical implications for rejuvenation strategies, proposes methodological refinements (e.g., adding a wait phase), and contributes an open-source testbed and web toolset for reproducible cloud-ageing experiments.
Abstract
Distributed systems in general and cloud systems in particular, are susceptible to failures that can lead to substantial economic and data losses, security breaches, and even potential threats to human safety. Software ageing is an example of one such vulnerability. It emerges due to routine re-usage of computational systems units which induce fatigue within the components, resulting in an increased failure rate and potential system breakdown. Due to its stochastic nature, ageing cannot be directly measured, instead ageing indicators as proxies are used. While there are dozens of studies on different ageing indicators, their comprehensive comparison in different settings remains underexplored. In this paper, we compare two ageing indicators in OpenStack as a use case. Specifically, our evaluation compares memory usage (including swap memory) and request response time, as readily available indicators. By executing multiple OpenStack deployments with varying configurations, we conduct a series of experiments and analyze the ageing indicators. Comparative analysis through statistical tests provides valuable insights into the strengths and weaknesses of the utilised ageing indicators. Finally, through an in-depth analysis of other OpenStack failures, we identify underlying failure patterns and their impact on the studied ageing indicators.
