Table of Contents
Fetching ...

CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization

Jacob O. Tørring, Carl Hvarfner, Luigi Nardi, Magnus Själander

TL;DR

CATBench addresses the lack of standardized benchmarks for Bayesian optimization in compiler autotuning by introducing a comprehensive, containerized benchmarking suite extended from BaCO. It combines real-world tasks (TACO and RISE/ELEVATE) with multi-fidelity and multi-objective evaluation, plus a scalable client-server interface using gRPC and Docker for reproducible experimentation. Empirical results show BaCO outperforms non-BO baselines on single-objective TACO and highlights robust Pareto-front structure for RISE/ELEVATE, while revealing hardware-induced variability and informative feature-importance patterns. The suite enables reproducible benchmarking, transfer learning studies, and rapid algorithm development, with clear pathways for expansion and community-driven leaderboards.

Abstract

Bayesian optimization is a powerful method for automating tuning of compilers. The complex landscape of autotuning provides a myriad of rarely considered structural challenges for black-box optimizers, and the lack of standardized benchmarks has limited the study of Bayesian optimization within the domain. To address this, we present CATBench, a comprehensive benchmarking suite that captures the complexities of compiler autotuning, ranging from discrete, conditional, and permutation parameter types to known and unknown binary constraints, as well as both multi-fidelity and multi-objective evaluations. The benchmarks in CATBench span a range of machine learning-oriented computations, from tensor algebra to image processing and clustering, and uses state-of-the-art compilers, such as TACO and RISE/ELEVATE. CATBench offers a unified interface for evaluating Bayesian optimization algorithms, promoting reproducibility and innovation through an easy-to-use, fully containerized setup of both surrogate and real-world compiler optimization tasks. We validate CATBench on several state-of-the-art algorithms, revealing their strengths and weaknesses and demonstrating the suite's potential for advancing both Bayesian optimization and compiler autotuning research.

CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization

TL;DR

CATBench addresses the lack of standardized benchmarks for Bayesian optimization in compiler autotuning by introducing a comprehensive, containerized benchmarking suite extended from BaCO. It combines real-world tasks (TACO and RISE/ELEVATE) with multi-fidelity and multi-objective evaluation, plus a scalable client-server interface using gRPC and Docker for reproducible experimentation. Empirical results show BaCO outperforms non-BO baselines on single-objective TACO and highlights robust Pareto-front structure for RISE/ELEVATE, while revealing hardware-induced variability and informative feature-importance patterns. The suite enables reproducible benchmarking, transfer learning studies, and rapid algorithm development, with clear pathways for expansion and community-driven leaderboards.

Abstract

Bayesian optimization is a powerful method for automating tuning of compilers. The complex landscape of autotuning provides a myriad of rarely considered structural challenges for black-box optimizers, and the lack of standardized benchmarks has limited the study of Bayesian optimization within the domain. To address this, we present CATBench, a comprehensive benchmarking suite that captures the complexities of compiler autotuning, ranging from discrete, conditional, and permutation parameter types to known and unknown binary constraints, as well as both multi-fidelity and multi-objective evaluations. The benchmarks in CATBench span a range of machine learning-oriented computations, from tensor algebra to image processing and clustering, and uses state-of-the-art compilers, such as TACO and RISE/ELEVATE. CATBench offers a unified interface for evaluating Bayesian optimization algorithms, promoting reproducibility and innovation through an easy-to-use, fully containerized setup of both surrogate and real-world compiler optimization tasks. We validate CATBench on several state-of-the-art algorithms, revealing their strengths and weaknesses and demonstrating the suite's potential for advancing both Bayesian optimization and compiler autotuning research.

Paper Structure

This paper contains 25 sections, 10 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The CATBench network-based client-server model.
  • Figure 2: Minimum average compute time including two standard errors per optimization algorithm on the TACO optimization tasks on XeonE5. BaCO substantially outperforms non-BO algorithms. NSGA-2 encountered numerical errors on SDDMM and MTTKRP.
  • Figure 3: Hypervolume improvement on compute time and GPU energy consumption from a reference point, including two standard errors per optimization algorithm on the RISE/ELEVATE optimization tasks on RTXTitan. BaCO substantially outperforms non-BO algorithms.
  • Figure 4: Empirical density of speedups for SPMV, SDDMM and Stencil for three anthreed two sets of hardware, respectively. The distribution of output is similar across tasks, suggesting that a change in hardware yields a similar, albeit not identical task.
  • Figure 5: (left) Feature importance for both objectives on the Stencil benchmark run on an TitanRTX (RTX) and a TitanV (TV). (right) Feature importance for compute time for the SPMM benchmark across hardware. The omp_num_threads parameter is the most important on both Epyc and XeonE5, while only marginally impactful when run on XeonG. While all objectives are fairly sparse, the feature importances can vary substantially between hardware.
  • ...and 3 more figures