Performance analysis of mdx II: A next-generation cloud platform for cross-disciplinary data science research
Keichi Takahashi, Tomonori Hayami, Yu Mukaizono, Yuki Teramae, Susumu Date
TL;DR
This paper evaluates mdx II, an OpenStack-based IaaS cloud designed for cross-disciplinary data science research in Japan, by benchmarking it against AWS across 16-vCPU and 224-vCPU configurations. It uses a broad suite of benchmarks (LINPACK, BabelStream, iPerf, fio, warp, SPEChpc, PDS) to characterize compute, memory, network, and storage performance, and analyzes virtualization overhead. Key findings show mdx II often surpasses AWS in compute and memory throughput, with Lustre-backed I/O delivering strong data analytics performance, while memory-bound tasks reveal notable virtualization overhead. The work provides practical guidance for users and operators and offers design insights for future data-centric cloud platforms, including the importance of enabling virtio multiqueue for network throughput. Overall, the study demonstrates mdx II’s potential to support high-performance data analytics workloads and cross-institution collaborations, informing both deployment choices and platform design.
Abstract
mdx II is an Infrastructure-as-a-Service (IaaS) cloud platform designed to accelerate data science research and foster cross-disciplinary collaborations among universities and research institutions in Japan. Unlike traditional high-performance computing systems, mdx II leverages OpenStack to provide customizable and isolated computing environments consisting of virtual machines, virtual networks, and advanced storage. This paper presents a comprehensive performance evaluation of mdx II, including a comparison to Amazon Web Services (AWS). We evaluated the performance of a 16-vCPU VM from multiple aspects including floating-point computing performance, memory throughput, network throughput, file system and object storage performance, and real-world application performance. Compared to an AWS 16-vCPU instance, the results indicated that mdx II outperforms AWS in many aspects and demonstrated that mdx II holds significant promise for high-performance data analytics (HPDA) workloads. We also evaluated the virtualization overhead using a 224-vCPU VM occupying an entire host. The results suggested that the virtualization overhead is minimal for compute-intensive benchmarks, while memory-intensive benchmarks experienced larger overheads. These findings are expected to help users of mdx II to obtain high performance for their data science workloads and offer insights to the designers of future data-centric cloud platforms.
