Table of Contents
Fetching ...

Dividable Configuration Performance Learning

Jingzhi Gong, Tao Chen, Rami Bahsoon

TL;DR

A model-agnostic and sparsity-robust framework for predicting configuration performance, dubbed <monospace>DaL</monospace> based on the new paradigm of dividable learning that builds a model via “divide-and-learn”, which considerably improves different global models when using them as the underlying local models, which further strengthens its flexibility.

Abstract

Machine/deep learning models have been widely adopted for predicting the configuration performance of software systems. However, a crucial yet unaddressed challenge is how to cater for the sparsity inherited from the configuration landscape: the influence of configuration options (features) and the distribution of data samples are highly sparse. In this paper, we propose a model-agnostic and sparsity-robust framework for predicting configuration performance, dubbed DaL, based on the new paradigm of dividable learning that builds a model via "divide-and-learn". To handle sample sparsity, the samples from the configuration landscape are divided into distant divisions, for each of which we build a sparse local model, e.g., regularized Hierarchical Interaction Neural Network, to deal with the feature sparsity. A newly given configuration would then be assigned to the right model of division for the final prediction. Further, DaL adaptively determines the optimal number of divisions required for a system and sample size without any extra training or profiling. Experiment results from 12 real-world systems and five sets of training data reveal that, compared with the state-of-the-art approaches, DaL performs no worse than the best counterpart on 44 out of 60 cases with up to 1.61x improvement on accuracy; requires fewer samples to reach the same/better accuracy; and producing acceptable training overhead. In particular, the mechanism that adapted the parameter d can reach the optimal value for 76.43% of the individual runs. The result also confirms that the paradigm of dividable learning is more suitable than other similar paradigms such as ensemble learning for predicting configuration performance. Practically, DaL considerably improves different global models when using them as the underlying local models, which further strengthens its flexibility.

Dividable Configuration Performance Learning

TL;DR

A model-agnostic and sparsity-robust framework for predicting configuration performance, dubbed <monospace>DaL</monospace> based on the new paradigm of dividable learning that builds a model via “divide-and-learn”, which considerably improves different global models when using them as the underlying local models, which further strengthens its flexibility.

Abstract

Machine/deep learning models have been widely adopted for predicting the configuration performance of software systems. However, a crucial yet unaddressed challenge is how to cater for the sparsity inherited from the configuration landscape: the influence of configuration options (features) and the distribution of data samples are highly sparse. In this paper, we propose a model-agnostic and sparsity-robust framework for predicting configuration performance, dubbed DaL, based on the new paradigm of dividable learning that builds a model via "divide-and-learn". To handle sample sparsity, the samples from the configuration landscape are divided into distant divisions, for each of which we build a sparse local model, e.g., regularized Hierarchical Interaction Neural Network, to deal with the feature sparsity. A newly given configuration would then be assigned to the right model of division for the final prediction. Further, DaL adaptively determines the optimal number of divisions required for a system and sample size without any extra training or profiling. Experiment results from 12 real-world systems and five sets of training data reveal that, compared with the state-of-the-art approaches, DaL performs no worse than the best counterpart on 44 out of 60 cases with up to 1.61x improvement on accuracy; requires fewer samples to reach the same/better accuracy; and producing acceptable training overhead. In particular, the mechanism that adapted the parameter d can reach the optimal value for 76.43% of the individual runs. The result also confirms that the paradigm of dividable learning is more suitable than other similar paradigms such as ensemble learning for predicting configuration performance. Practically, DaL considerably improves different global models when using them as the underlying local models, which further strengthens its flexibility.
Paper Structure (58 sections, 10 equations, 11 figures, 8 tables, 2 algorithms)

This paper contains 58 sections, 10 equations, 11 figures, 8 tables, 2 algorithms.

Figures (11)

  • Figure 1: Projection of configurations in the landscape using t-SNE (Note that the t-SNE dimensions are extracted and newly emerged features that do not correspond to any actual configuration options of the systems).
  • Figure 2: The architecture of DaL.
  • Figure 3: Projection of CART for VP8 showing the possible divisions with different colors under alternative depth $d$.
  • Figure 4: The changing optimal $d$ on DaL depending on the software systems being modeled and the training/testing data across 30 runs.
  • Figure 5: Comparing the differences between HV and $\mu$HV for divisions with $d=1$ and $d=2$ under system x264. The distinctly colored area indicates the individual HV value calculated based on one (for $\mu$HV) or more divisions (for HV). (a) shows that $D_1$ is the nondominated division for $d=1$ while $D_3$ and $D_4$ are the nondominated ones for $d=2$. In (b), the original HV value equals the area between $D_1$ alone and the reference nadir point, i.e., HV$=19825.43$. In (c), the original HV value equals the non-overlapped area from $D_3$ and $D_4$ to the reference nadir point, i.e., HV$=88025.51$. In (d), $\mu$HV is the mean over the HV value of the area between $D_1$ and the reference nadir point together with that of the area between $D_2$ and the reference nadir point, i.e., $\mu$HV$=(83951.55+18216.65) /2=51084.10$. In (e), $\mu$HV is the mean over the HV value for the area of each of the divisions $D_3$, $D_4$, $D_5$, and $D_6$, i.e., $\mu$HV$=(131212.91+30873.60+5862.67+1014.70)/4=42240.97$. With the original HV, $d=2$ would be chosen while using our proposed $\mu$HV, $d=1$ would be chosen. In fact, here $d=1$ tends to be better in general considering all the possible divisions in terms of $h$ and $z$. The actual validation also confirms that $d=1$ leads to $2.28\times$ more accurate result than that of $d=2$.
  • ...and 6 more figures