Table of Contents
Fetching ...

PerfGen: Automated Performance Benchmark Generation for Big Data Analytics

Jiyuan Wang, Jason Teoh, Muhammand Ali Gulza, Qian Zhang, Miryung Kim

TL;DR

PerfGen tackles the problem of debugging performance issues in data-intensive scalable computing by automatically generating inputs that trigger performance skews. It introduces a phased fuzzing workflow, customizable performance-monitor templates, and skew-inspired mutations, augmented by GPT-4 generated pseudo-inverse functions to bootstrap end-to-end inputs. Across four Spark-based case studies, PerfGen achieves substantial speedups (averaging around 43X) and requires an extremely small fraction of the iterations compared with baseline fuzzing, demonstrating practical effectiveness for reproducing data, memory, and computational skews. The approach significantly accelerates performance debugging in big data analytics and is generalizable beyond Spark to other DISC platforms like MapReduce and Hadoop.

Abstract

Many symptoms of poor performance in big data analytics such as computational skews, data skews, and memory skews are input dependent. However, due to the lack of inputs that can trigger such performance symptoms, it is hard to debug and test big data analytics. We design PerfGen to automatically generate inputs for the purpose of performance testing. PerfGen overcomes three challenges when naively using automated fuzz testing for the purpose of performance testing. First, typical greybox fuzzing relies on coverage as a guidance signal and thus is unlikely to trigger interesting performance behavior. Therefore, PerfGen provides performance monitor templates that a user can extend to serve as a set of guidance metrics for grey-box fuzzing. Second, performance symptoms may occur at an intermediate or later stage of a big data analytics pipeline. Thus, PerfGen uses a phased fuzzing approach. This approach identifies symptom-causing intermediate inputs at an intermediate stage first and then converts them to the inputs at the beginning of the program with a pseudo-inverse function generated by a large language model. Third, PerfGen defines sets of skew-inspired input mutations, which increases the chance of inducing performance problems. We evaluate PerfGen using four case studies. PerfGen achieves at least 11x speedup compared to a traditional fuzzing approach when generating inputs to trigger performance symptoms. Additionally, identifying intermediate inputs first and then converting them to original inputs enables PerfGen to generate such workloads in less than 0.004% of the iterations required by a baseline approach.

PerfGen: Automated Performance Benchmark Generation for Big Data Analytics

TL;DR

PerfGen tackles the problem of debugging performance issues in data-intensive scalable computing by automatically generating inputs that trigger performance skews. It introduces a phased fuzzing workflow, customizable performance-monitor templates, and skew-inspired mutations, augmented by GPT-4 generated pseudo-inverse functions to bootstrap end-to-end inputs. Across four Spark-based case studies, PerfGen achieves substantial speedups (averaging around 43X) and requires an extremely small fraction of the iterations compared with baseline fuzzing, demonstrating practical effectiveness for reproducing data, memory, and computational skews. The approach significantly accelerates performance debugging in big data analytics and is generalizable beyond Spark to other DISC platforms like MapReduce and Hadoop.

Abstract

Many symptoms of poor performance in big data analytics such as computational skews, data skews, and memory skews are input dependent. However, due to the lack of inputs that can trigger such performance symptoms, it is hard to debug and test big data analytics. We design PerfGen to automatically generate inputs for the purpose of performance testing. PerfGen overcomes three challenges when naively using automated fuzz testing for the purpose of performance testing. First, typical greybox fuzzing relies on coverage as a guidance signal and thus is unlikely to trigger interesting performance behavior. Therefore, PerfGen provides performance monitor templates that a user can extend to serve as a set of guidance metrics for grey-box fuzzing. Second, performance symptoms may occur at an intermediate or later stage of a big data analytics pipeline. Thus, PerfGen uses a phased fuzzing approach. This approach identifies symptom-causing intermediate inputs at an intermediate stage first and then converts them to the inputs at the beginning of the program with a pseudo-inverse function generated by a large language model. Third, PerfGen defines sets of skew-inspired input mutations, which increases the chance of inducing performance problems. We evaluate PerfGen using four case studies. PerfGen achieves at least 11x speedup compared to a traditional fuzzing approach when generating inputs to trigger performance symptoms. Additionally, identifying intermediate inputs first and then converting them to original inputs enables PerfGen to generate such workloads in less than 0.004% of the iterations required by a baseline approach.

Paper Structure

This paper contains 16 sections, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Three sources of performance skews
  • Figure 2: The Collatz program which applies the solve_collatz function to each input integer and sums the result by distinct integer input.
  • Figure 3: The solve_collatz function used in Figure \ref{['fig:code_collatz']} to determine each integer's Collatz sequence length and compute a polynomial-time result based on the sequence length. For example, an input of 3 has a Collatz length of 7 and calling solve_collatz(3) takes 1 ms to compute, while an input of 27 has a length of 111 and takes 4.9 s to compute.
  • Figure 4: A pseudo-inverse function to convert solved inputs into inputs for the entire Collatz program (Figure \ref{['fig:code_collatz']}, lines 1-7).
  • Figure 5: Code demonstrating how a user can configure PerfGen for the Collatz program
  • ...and 8 more figures