Table of Contents
Fetching ...

MFTune: An Efficient Multi-fidelity Framework for Spark SQL Configuration Tuning

Beicheng Xu, Lingching Tung, Yuchen Wang, Yupeng Lu, Bin Cui

Abstract

Apache Spark SQL is a cornerstone of modern big data analytics.However,optimizing Spark SQL performance is challenging due to its vast configuration space and the prohibitive cost of evaluating massive workloads. Existing tuning methods predominantly rely on full-fidelity evaluations, which are extremely time-consuming,often leading to suboptimal performance within practical budgets.While multi-fidelity optimization offers a potential solution, directly applying standard techniques-such as data volume reduction or early stopping-proves ineffective for Spark SQL as they fail to preserve performance correlations or represent true system bottlenecks. To address these challenges, we propose MFTune, an efficient multi-fidelity framework that introduces a query-based fidelity partitioning strategy, utilizing representative SQL subsets to provide accurate, low-cost proxies. To navigate the huge search space, MFTune incorporates a density-based optimization mechanism for automated knob and range compression, alongside an adapted transfer learning approach and a two-phase warm start to further accelerate the tuning process. Experimental results on TPC-H and TPC-DS benchmarks demonstrate that MFTune significantly outperforms five state-of-the-art tuning methods, identifying superior configurations within practical time constraints.

MFTune: An Efficient Multi-fidelity Framework for Spark SQL Configuration Tuning

Abstract

Apache Spark SQL is a cornerstone of modern big data analytics.However,optimizing Spark SQL performance is challenging due to its vast configuration space and the prohibitive cost of evaluating massive workloads. Existing tuning methods predominantly rely on full-fidelity evaluations, which are extremely time-consuming,often leading to suboptimal performance within practical budgets.While multi-fidelity optimization offers a potential solution, directly applying standard techniques-such as data volume reduction or early stopping-proves ineffective for Spark SQL as they fail to preserve performance correlations or represent true system bottlenecks. To address these challenges, we propose MFTune, an efficient multi-fidelity framework that introduces a query-based fidelity partitioning strategy, utilizing representative SQL subsets to provide accurate, low-cost proxies. To navigate the huge search space, MFTune incorporates a density-based optimization mechanism for automated knob and range compression, alongside an adapted transfer learning approach and a two-phase warm start to further accelerate the tuning process. Experimental results on TPC-H and TPC-DS benchmarks demonstrate that MFTune significantly outperforms five state-of-the-art tuning methods, identifying superior configurations within practical time constraints.
Paper Structure (32 sections, 8 equations, 6 figures, 3 tables, 2 algorithms)

This paper contains 32 sections, 8 equations, 6 figures, 3 tables, 2 algorithms.

Figures (6)

  • Figure 1: Analysis of multi-fidelity mechanism on TPC-DS (600GB). (a) Performance distribution of configurations evaluated within a 48-hour budget, comparing MFTune against a single-fidelity variant. (b) Fidelity correlation. We evaluate the representativeness of three proxy methods by randomly sampling 50 configurations from the search space. The proxies include: 1) Data Volume: scaling the dataset across {30,100,200,400,600} GB; 2) SQL Early Stop: executing the first {1/27,1/9,1/3,2/3,1} of the total SQLs; and 3) SQL Selection: selecting representative SQL subsets at fidelity levels 1/27 to 1 based on Section \ref{['sec:fidelity_partition']}. Each point represents the Kendall’s Tau correlation between proxy and full-fidelity performance, plotted against the average latency ratio (proxy latency relative to full-fidelity latency) across 50 configurations.
  • Figure 2: Overview: architecture and workflow of MFTune.
  • Figure 3: Comparison of tuning performance over time across six scenarios, encompassing two benchmarks and three experimental settings. Each curve represents the average cumulative best latency across three independent experimental runs of a method. The vertical red dashed line indicates the time point MFTune activates multi-fidelity optimization when fidelity partitioning is not applicable at the beginning.
  • Figure 4: Generalization performance of transfer-learning-based tuning methods across diverse settings. Results represent the speedup of the optimal latency achieved within 48 hours relative to the default Spark configuration.
  • Figure 5: Effectiveness of the multi-fidelity optimization mechanism in TPC-DS benchmark. (a) Ablation study on TPC-DS (600GB) using Hardware A. (b) Fidelity correlation across 16 TPC-DS workloads. Each bar compares the Kendall’s Tau correlation with full fidelity for our SQL Selection method (partitioned via historical observations) versus the Data Volume baseline.
  • ...and 1 more figures