Efficient Construction of Large Search Spaces for Auto-Tuning
Floris-Jan Willemsen, Rob V. van Nieuwpoort, Ben van Werkhoven
TL;DR
We address the bottleneck of auto-tuning where constructing large constrained search spaces is prohibitively costly, and reformulate this problem as a Constraint Satisfaction Problem (CSP) with a runtime parser to translate user constraints into solver-ready expressions. The method integrates an optimized backtracking CSP solver (enumerating all solutions) into Kernel Tuner, aided by constraint-specific enhancements, C-extensions for speed, and flexible output formats, yielding substantial speedups. Formally, for variables $X$, domains $D$, and constraints $C$, the goal is to maximize performance over feasible configurations $v^\star = \underset{v\in\mathcal{V}}{\text{arg max}} f_{H_j,I_k}(A_i)$ by solving the CSP $\mathcal{P}=(X,D,C)$. Across synthetic and eight real-world benchmarks, the optimized CSP approach achieves four orders of magnitude faster construction than brute force, three orders faster than an unoptimized CSP, and one to two orders of magnitude faster than chain-of-trees-based auto-tuning frameworks, enabling sub-second search-space construction and enabling exploration of previously unattainable problem scales. The contributions are released as open-source packages (Kernel Tuner and python-constraint), providing a robust, scalable, and accessible solution for constraint-based auto-tuning.”
Abstract
Automatic performance tuning, or auto-tuning, accelerates high-performance codes by exploring vast spaces of code variants. However, due to the large number of possible combinations and complex constraints, constructing these search spaces can be a major bottleneck. Real-world applications have been encountered where the search space construction takes minutes to hours or even days. Current state-of-the-art techniques for search space construction, such as chain-of-trees, lack a formal foundation and only perform adequately on a specific subset of search spaces. We show that search space construction for constraint-based auto-tuning can be reformulated as a Constraint Satisfaction Problem (CSP). Building on this insight with a CSP solver, we develop a runtime parser that translates user-defined constraint functions into solver-optimal expressions, optimize the solver to exploit common structures in auto-tuning constraints, and integrate these and other advances in open-source tools. These contributions substantially improve performance and accessibility while preserving flexibility. We evaluate our approach using a diverse set of benchmarks, demonstrating that our optimized solver reduces construction time by four orders of magnitude versus brute-force enumeration, three orders of magnitude versus an unoptimized CSP solver, and one to two orders of magnitude versus leading auto-tuning frameworks built on chain-of-trees. We thus eliminate a critical scalability barrier for auto-tuning and provide a drop-in solution that enables the exploration of previously unattainable problem scales in auto-tuning and related domains.
