Rethinking Performance Analysis for Configurable Software Systems: A Case Study from a Fitness Landscape Perspective
Mingyu Huang, Peili Mao, Ke Li
TL;DR
Configurable software systems pose challenges for understanding how configurations map to performance due to black-box, high-dimensional spaces. The authors propose a fitness landscape perspective and GraphFLA, a graph-based framework to model configuration spaces as landscapes and enable scalable analysis. They conduct a large-scale case study on LLVM, Apache, and SQLite across 32 workloads, collecting over 86 million configurations to reveal six key findings about fitness distributions, prominent regions, ruggedness, optima distributions, and option interactions, with implications for tuning and performance modeling. They also show how surrogate models and optimization procedures behave in rugged landscapes and provide open data and a flexible toolkit for researchers to analyze configurable systems.
Abstract
Modern software systems are often highly configurable to tailor varied requirements from diverse stakeholders. Understanding the mapping between configurations and the desired performance attributes plays a fundamental role in advancing the controllability and tuning of the underlying system, yet has long been a dark hole of knowledge due to its black-box nature. While there have been previous efforts in performance analysis for these systems, they analyze the configurations as isolated data points without considering their inherent spatial relationships. This renders them incapable of interrogating many important aspects of the configuration space like local optima. In this work, we advocate a novel perspective to rethink performance analysis -- modeling the configuration space as a structured ``landscape''. To support this proposition, we designed \our, an open-source, graph data mining empowered fitness landscape analysis (FLA) framework. By applying this framework to $86$M benchmarked configurations from $32$ running workloads of $3$ real-world systems, we arrived at $6$ main findings, which together constitute a holistic picture of the landscape topography, with thorough discussions about their implications on both configuration tuning and performance modeling.
