How Low Can We Go? Minimizing Interaction Samples for Configurable Systems

Dominik Krupke; Ahmad Moradi; Michael Perk; Phillip Keldenich; Gabriel Gehrke; Sebastian Krieter; Thomas Thüm; Sándor P. Fekete

How Low Can We Go? Minimizing Interaction Samples for Configurable Systems

Dominik Krupke, Ahmad Moradi, Michael Perk, Phillip Keldenich, Gabriel Gehrke, Sebastian Krieter, Thomas Thüm, Sándor P. Fekete

TL;DR

Configurable software yields a combinatorial explosion in configurations, making exhaustive t-wise testing impractical. The authors present SampLNS, a duality-based framework that provides provable lower bounds and an improved sampler for minimizing $t$-wise interaction samples, combining CP-SAT, MILP, and Large Neighborhood Search to scale. Empirical results across 47 models show SampLNS delivers smaller samples than state-of-the-art baselines in the majority of cases and certifies optimality for a substantial fraction, dramatically reducing testing resources. The work offers a principled method to certify solution quality in hard combinatorial sampling problems and lays groundwork for extending these ideas to larger, more complex configurable systems.

Abstract

Modern software systems are typically configurable, a fundamental prerequisite for wide applicability and reusability. This flexibility poses an extraordinary challenge for quality assurance, as the enormous number of possible configurations makes it impractical to test each of them separately. This is where t-wise interaction sampling can be used to systematically cover the configuration space and detect unknown feature interactions. Over the last two decades, numerous algorithms for computing small interaction samples have been studied, providing improvements for a range of heuristic results; nevertheless, it has remained unclear how much these results can still be improved. We present a significant breakthrough: a fundamental framework, based on the mathematical principle of duality, for combining near-optimal solutions with provable lower bounds on the required sample size. This implies that we no longer need to work on heuristics with marginal or no improvement, but can certify the solution quality by establishing a limit on the remaining gap; in many cases, we can even prove optimality of achieved solutions. This theoretical contribution also provides extensive practical improvements: Our algorithm SampLNS was tested on 47 small and medium-sized configurable systems from the existing literature. SampLNS can reliably find samples of smaller size than previous methods in 85% of the cases; moreover, we can achieve and prove optimality of solutions for 63% of all instances. This makes it possible to avoid cumbersome efforts of minimizing samples by researchers as well as practitioners, and substantially save testing resources for most configurable systems.

How Low Can We Go? Minimizing Interaction Samples for Configurable Systems

TL;DR

-wise interaction samples, combining CP-SAT, MILP, and Large Neighborhood Search to scale. Empirical results across 47 models show SampLNS delivers smaller samples than state-of-the-art baselines in the majority of cases and certifies optimality for a substantial fraction, dramatically reducing testing resources. The work offers a principled method to certify solution quality in hard combinatorial sampling problems and lays groundwork for extending these ideas to larger, more complex configurable systems.

Abstract

Paper Structure (37 sections, 2 theorems, 10 equations, 5 figures, 1 table, 2 algorithms)

This paper contains 37 sections, 2 theorems, 10 equations, 5 figures, 1 table, 2 algorithms.

Introduction
Our Contributions
Preliminaries
Validity of Configurations
t-Wise Interaction Samples
Algorithmic Techniques
Duality and Quality Certificates
Lower bounds: Certifying the quality of solutions
Characterizing Mutual Exclusiveness
Extended Use of Invalid Interactions
Large Neighborhood Search for Lower Bounds
Example
Computing (near-) optimal solutions
Modeling Pairwise Sampling in CP-SAT
Large Neighborhood Search
...and 22 more sections

Key Result

Lemma 1

Consider a primal problem $\mathcal{A}: \min\{a(x)\mid x\in X\}$ and a dual problem $\mathcal{B}: \max\{b(x)\mid y\in Y\}$. Let $x^*\in X$ and $y^*\in Y$ such that $a(x^*)=b(y^*)$. Then $x^*$ is optimal for $\mathcal{A}$.

Figures (5)

Figure 1: Example of the lower bound computation with \ref{['alg:lb-lns']}. An edge in the compatibility graph indicates that the two interactions are compatible and can appear in the same configuration. A set of interactions without any edges, like in (b), is mutually exclusive and a lower bound on the necessary number of configurations. In this example, the initial set of mutually exclusive tuples $\{\{1,2\}, \{1, \textrm{-}2\}, \{\textrm{-}1, 3\}, \{\textrm{-}1, \textrm{-}3\}\}$ increases to $\{\{1,2\}, \{1, \textrm{-}2\}, \{\textrm{-}1, \textrm{-}3\},$$\{2,3\}, \{\textrm{-}2, 3\}\}$.
Figure 2: Examples of the coverage over the number of configurations. Each plot illustrates how coverage increases with the number of configurations: for instance, a value of 90% at 10.0 indicates that the first ten configurations in a randomly ordered sample would cover 90% of all feasible interactions. These samples were computed using YASA KTS+:VaMoS20. The rapid growth in coverage shows that even removing a large fraction of configurations leaves only a minor portion of interactions uncovered, which is crucial for the size of our constraint program. With few interactions becoming uncovered, the task of optimally recovering these through the constraint program becomes manageable. By shuffling the sample, we can repeatedly explore replacing large subsets of the sample with minimal configurations to achieve equivalent coverage.
Figure 3: Differences of the sample sizes of various sampling algorithms (with a 900s time limit) to the best lower bound computed by SampLNS. Not all algorithms were able to compute a feasible sample within the time limit, thus, the number of successfully solved models is added in parentheses. We can see that SampLNS never timed out, and 75% of the samples are at most 10% above the lower bound. More than 55% of the samples match the lower bound and, thus, are minimal. The next best algorithm is YASA with $m=10$ (resp. the 15-variant) with a median difference of 46% (resp. 42%). Incling yields the largest samples, with a median of twice the size of the lower bound.
Figure 4: Convergence of lower and upper bound over time for a selection of SampLNS runs. Values are relative to the best lower bound of the model, so lower and upper bound meeting at 100% indicates provable optimality. Lower and upper bound usually make quick progress already in the first seconds and minutes. The crosses on a curve indicate the end of an upper bound optimization step. The lower bound is only queried at the same time, such that not all increments are shown.
Figure 5: Analogous to \ref{['fig:eval:diff_to_lb']}, but comparing different versions of SampLNS where the initial sample was produced by different algorithms, to compare the dependence of SampLNS on the size of the initial sample. ICPL and Incling have significantly larger initial samples than YASA ($m=10$) as can be seen by the blue boxes, but SampLNS can still reduce them to a similar size as can be seen by the orange boxes. For YASA ($m=1$) and YASA ($m=10$), the optimized samples are nearly identical, despite a visible difference in the initial samples. This indicates that the size of the initial sample is of low importance for SampLNS.

Theorems & Definitions (5)

Definition 1
Lemma 1
proof
Theorem 1
proof

How Low Can We Go? Minimizing Interaction Samples for Configurable Systems

TL;DR

Abstract

How Low Can We Go? Minimizing Interaction Samples for Configurable Systems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (5)