Enriching the Machine Learning Workloads in BigBench
Matthias Polag, Todor Ivanov, Timo Eichhorn
TL;DR
This work addresses the lack of end-to-end ML benchmarks in big data platforms by extending BigBench V2 with three new ML workloads (M1–M3) and additional tasks (Q26, Q28) implemented across multiple libraries (MLlib, SystemML, Scikit-learn, Pandas). It provides a cross-library evaluation framework that contrasts different implementations of the same algorithms (e.g., K-Means, GMM, FP-Growth, Eclat, LDA, SVM, MLP) and assesses scalability on a 4-node cluster using synthetic BigBench V2 data. Experimental results reveal diverse performance and scalability profiles across workloads and libraries, highlighting strengths of SystemML in cluster contexts and the memory challenges encountered by certain ML tasks at large scale factors. The findings demonstrate the value of end-to-end ML benchmarks for informing library choice and integration strategies, with future work pointing to workflow management and deployment solutions such as MLflow or Kubeflow.
Abstract
In the era of Big Data and the growing support for Machine Learning, Deep Learning and Artificial Intelligence algorithms in the current software systems, there is an urgent need of standardized application benchmarks that stress test and evaluate these new technologies. Relying on the standardized BigBench (TPCx-BB) benchmark, this work enriches the improved BigBench V2 with three new workloads and expands the coverage of machine learning algorithms. Our workloads utilize multiple algorithms and compare different implementations for the same algorithm across several popular libraries like MLlib, SystemML, Scikit-learn and Pandas, demonstrating the relevance and usability of our benchmark extension.
