Table of Contents
Fetching ...

Pushing the Boundary: Specialising Deep Configuration Performance Learning

Jingzhi Gong

TL;DR

This thesis systematically surveys deep learning for configuration performance modeling, revealing key gaps in encoding choices, sparsity handling, and dynamic environments. It introduces DaL, a divide-and-learn framework that partitions data to train local models, improving accuracy across real-world systems with limited data. To address multi-environment settings, it proposes SeMPL, a sequential meta-learning approach that trains on related environments in a deliberate order, yielding substantial accuracy gains (up to 99%) and data efficiency (up to 3.86x speedups). Collectively, these artifacts advance the precision and robustness of performance prediction for highly configurable software, supporting more reliable tuning and runtime adaptation in practice.

Abstract

Software systems often have numerous configuration options that can be adjusted to meet different performance requirements. However, understanding the combined impact of these options on performance is often challenging, especially with limited real-world data. To tackle this issue, deep learning techniques have gained popularity due to their ability to capture complex relationships even with limited samples. This thesis begins with a systematic literature review of deep learning techniques in configuration performance modeling, analyzing 85 primary papers out of 948 searched papers. It identifies knowledge gaps and sets three objectives for the thesis. The first knowledge gap is the lack of understanding about which encoding scheme is better and in what circumstances. To address this, the thesis conducts an empirical study comparing three popular encoding schemes. Actionable suggestions are provided to support more reliable decisions. Another knowledge gap is the sparsity inherited from the configuration landscape. To handle this, the thesis proposes a model-agnostic and sparsity-robust framework called DaL, which uses a "divide-and-learn" approach. DaL outperforms state-of-the-art approaches in accuracy improvement across various real-world systems. The thesis also addresses the limitation of predicting under static environments by proposing a sequential meta-learning framework called SeMPL. Unlike traditional meta-learning frameworks, SeMPL trains meta-environments in a specialized order, resulting in significantly improved prediction accuracy in multi-environment scenarios. Overall, the thesis identifies and addresses critical knowledge gaps in deep performance learning, significantly advancing the accuracy of performance prediction.

Pushing the Boundary: Specialising Deep Configuration Performance Learning

TL;DR

This thesis systematically surveys deep learning for configuration performance modeling, revealing key gaps in encoding choices, sparsity handling, and dynamic environments. It introduces DaL, a divide-and-learn framework that partitions data to train local models, improving accuracy across real-world systems with limited data. To address multi-environment settings, it proposes SeMPL, a sequential meta-learning approach that trains on related environments in a deliberate order, yielding substantial accuracy gains (up to 99%) and data efficiency (up to 3.86x speedups). Collectively, these artifacts advance the precision and robustness of performance prediction for highly configurable software, supporting more reliable tuning and runtime adaptation in practice.

Abstract

Software systems often have numerous configuration options that can be adjusted to meet different performance requirements. However, understanding the combined impact of these options on performance is often challenging, especially with limited real-world data. To tackle this issue, deep learning techniques have gained popularity due to their ability to capture complex relationships even with limited samples. This thesis begins with a systematic literature review of deep learning techniques in configuration performance modeling, analyzing 85 primary papers out of 948 searched papers. It identifies knowledge gaps and sets three objectives for the thesis. The first knowledge gap is the lack of understanding about which encoding scheme is better and in what circumstances. To address this, the thesis conducts an empirical study comparing three popular encoding schemes. Actionable suggestions are provided to support more reliable decisions. Another knowledge gap is the sparsity inherited from the configuration landscape. To handle this, the thesis proposes a model-agnostic and sparsity-robust framework called DaL, which uses a "divide-and-learn" approach. DaL outperforms state-of-the-art approaches in accuracy improvement across various real-world systems. The thesis also addresses the limitation of predicting under static environments by proposing a sequential meta-learning framework called SeMPL. Unlike traditional meta-learning frameworks, SeMPL trains meta-environments in a specialized order, resulting in significantly improved prediction accuracy in multi-environment scenarios. Overall, the thesis identifies and addresses critical knowledge gaps in deep performance learning, significantly advancing the accuracy of performance prediction.
Paper Structure (282 sections, 20 equations, 41 figures, 49 tables, 5 algorithms)

This paper contains 282 sections, 20 equations, 41 figures, 49 tables, 5 algorithms.

Figures (41)

  • Figure 1: Overview of the structure of this thesis.
  • Figure 2: The process model of the design science research methodology used in this study.
  • Figure 3: Cumulative number of primary studies on deep configuration performance learning models.
  • Figure 4: Deep learning pipeline for performance modeling.
  • Figure 5: Overview of the systematic literature review protocol.
  • ...and 36 more figures