Predicting Configuration Performance in Multiple Environments with Sequential Meta-learning

Jingzhi Gong; Tao Chen

Predicting Configuration Performance in Multiple Environments with Sequential Meta-learning

Jingzhi Gong, Tao Chen

TL;DR

This work tackles predicting software configuration performance across heterogeneous environments by introducing SeMPL, a sequential meta-learning framework. Unlike parallel meta-learning, SeMPL trains meta-environments one by one in a carefully chosen order, enabling discrimination of environment contributions and better initialization for unseen targets. The approach combines sequence selection, meta-training with a base learner (DeepPerf by default), and target fine-tuning, achieving substantial accuracy gains and data efficiency across nine systems and 3–10 environments. Empirical results show SeMPL outperforming both single-environment models and other multi-environment baselines with up to 99% accuracy improvement and up to 3.86x speedup, highlighting its practical potential for robust configuration performance modeling.

Abstract

Learning and predicting the performance of given software configurations are of high importance to many software engineering activities. While configurable software systems will almost certainly face diverse running environments (e.g., version, hardware, and workload), current work often either builds performance models under a single environment or fails to properly handle data from diverse settings, hence restricting their accuracy for new environments. In this paper, we target configuration performance learning under multiple environments. We do so by designing SeMPL - a meta-learning framework that learns the common understanding from configurations measured in distinct (meta) environments and generalizes them to the unforeseen, target environment. What makes it unique is that unlike common meta-learning frameworks (e.g., MAML and MetaSGD) that train the meta environments in parallel, we train them sequentially, one at a time. The order of training naturally allows discriminating the contributions among meta environments in the meta-model built, which fits better with the characteristic of configuration data that is known to dramatically differ between different environments. Through comparing with 15 state-of-the-art models under nine systems, our extensive experimental results demonstrate that SeMPL performs considerably better on 89% of the systems with up to 99% accuracy improvement, while being data-efficient, leading to a maximum of 3.86x speedup. All code and data can be found at our repository: https://github.com/ideas-labo/SeMPL.

Predicting Configuration Performance in Multiple Environments with Sequential Meta-learning

TL;DR

Abstract

Paper Structure (44 sections, 6 equations, 8 figures, 1 table, 3 algorithms)

This paper contains 44 sections, 6 equations, 8 figures, 1 table, 3 algorithms.

Introduction
Preliminaries and Related Work
Single Environment Configuration Performance Learning
Configuration Performance Learning with Multiple Environment Inputs
Joint Learning for Configuration Performance
Transfer Configuration Performance Learning
Multi-Task Configuration Performance Learning
Meta-Learning for Configuration Performance
The Theory behind SeMPL for Configuration Performance Learning
The Sequence Matters
Train Later Contributes More
More Meta Environments are Beneficial
Implementing and Engineering SeMPL
Sequence Selection
How to assess the usefulness of meta environments to the target environment?
...and 29 more sections

Figures (8)

Figure 1: Workflow of MAML and the proposed SeMPL. The meta-model can be produced by any base learner.
Figure 2: Illustrating the distributions of the model parameter values in different situations under a real-world software system; the base learner is a regularized Deep Neural Network (it is best viewed in color). The x- and y-axis are model parameters and their corresponding performance values, respectively.
Figure 3: Empirical results that verify the properties of real-world software systems. The y-axis is the testing Mean Relative Error (MRE) on $\mathbfcal{E}_{target}$. (a) confirms Property 1 and 2; $\mathbfcal{E}_{3}$ is the most useful environments for $\mathbfcal{E}_{target}$, following by $\mathbfcal{E}_{2}$ and then $\mathbfcal{E}_{1}$. (b) reveals Property 3.
Figure 4: The SeMPL architecture for learning configuration performance of a system with multiple environments.
Figure 5: SeMPL versus single environment models. For the simplicity of exposition, we report the log-transformed average MRE (and its standard error) of all target environments and runs. For speedup ($sp={b \over s}$), denotes the mean MRE for $b$; ✕ indicates the point of $s$. Detailed data can be accessed at: https://github.com/ideas-labo/SeMPL/blob/main/Figure5_full.pdf.
...and 3 more figures

Predicting Configuration Performance in Multiple Environments with Sequential Meta-learning

TL;DR

Abstract

Predicting Configuration Performance in Multiple Environments with Sequential Meta-learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)