BIPeC: A Combined Change-Point Analyzer to Identify Performance Regressions in Large-scale Database Systems
Zhan Lyu, Thomas Bach, Yong Li, Nguyen Minh Le, Lars Hoemke
TL;DR
This work tackles automated detection of performance regressions in SAP HANA by analyzing large-scale time-series performance metrics. It introduces BIPeC, a framework that couples Bayesian change-point detection with the Pruned Exact Linear Time (PELT) algorithm, augmented by preprocessing, a stepwise detection-refinement pipeline, and a feedback loop for continuous improvement. The approach uses a Bayes factor $B = \frac{P(D|H_2)}{P(D|H_1)}$ with Poisson likelihoods and MCMC-based integration, alongside a PELT objective $C(D,n) + \beta n$ with an RBF-based cost, to accurately identify change points. Empirical results on public datasets and SAP HANA BMDB show that BIPeC delivers higher precision and F1 scores than traditional CPD methods, demonstrating improved accuracy, scalability, and practical impact for proactive performance management in large-scale databases.
Abstract
Performance testing in large-scale database systems like SAP HANA is a crucial yet labor-intensive task, involving extensive manual analysis of thousands of measurements, such as CPU time and elapsed time. Manual maintenance of these metrics is time-consuming and susceptible to human error, making early detection of performance regressions challenging. We address these issues by proposing an automated approach to detect performance regressions in such measurements. Our approach integrates Bayesian inference with the Pruned Exact Linear Time (PELT) algorithm, enhancing the detection of change points and performance regressions with high precision and efficiency compared to previous approaches. Our method minimizes false negatives and ensures SAP HANA's system's reliability and performance quality. The proposed solution can accelerate testing and contribute to more sustainable performance management practices in large-scale data management environments.
