PerfGen: Automated Performance Benchmark Generation for Big Data Analytics
Jiyuan Wang, Jason Teoh, Muhammand Ali Gulza, Qian Zhang, Miryung Kim
TL;DR
PerfGen tackles the problem of debugging performance issues in data-intensive scalable computing by automatically generating inputs that trigger performance skews. It introduces a phased fuzzing workflow, customizable performance-monitor templates, and skew-inspired mutations, augmented by GPT-4 generated pseudo-inverse functions to bootstrap end-to-end inputs. Across four Spark-based case studies, PerfGen achieves substantial speedups (averaging around 43X) and requires an extremely small fraction of the iterations compared with baseline fuzzing, demonstrating practical effectiveness for reproducing data, memory, and computational skews. The approach significantly accelerates performance debugging in big data analytics and is generalizable beyond Spark to other DISC platforms like MapReduce and Hadoop.
Abstract
Many symptoms of poor performance in big data analytics such as computational skews, data skews, and memory skews are input dependent. However, due to the lack of inputs that can trigger such performance symptoms, it is hard to debug and test big data analytics. We design PerfGen to automatically generate inputs for the purpose of performance testing. PerfGen overcomes three challenges when naively using automated fuzz testing for the purpose of performance testing. First, typical greybox fuzzing relies on coverage as a guidance signal and thus is unlikely to trigger interesting performance behavior. Therefore, PerfGen provides performance monitor templates that a user can extend to serve as a set of guidance metrics for grey-box fuzzing. Second, performance symptoms may occur at an intermediate or later stage of a big data analytics pipeline. Thus, PerfGen uses a phased fuzzing approach. This approach identifies symptom-causing intermediate inputs at an intermediate stage first and then converts them to the inputs at the beginning of the program with a pseudo-inverse function generated by a large language model. Third, PerfGen defines sets of skew-inspired input mutations, which increases the chance of inducing performance problems. We evaluate PerfGen using four case studies. PerfGen achieves at least 11x speedup compared to a traditional fuzzing approach when generating inputs to trigger performance symptoms. Additionally, identifying intermediate inputs first and then converting them to original inputs enables PerfGen to generate such workloads in less than 0.004% of the iterations required by a baseline approach.
