Resource Management Schemes for Cloud-Native Platforms with Computing Containers of Docker and Kubernetes
Ying Mao, Yuqi Fu, Suwen Gu, Wenrui Mu, Long Cheng, Qingzhi Liu
TL;DR
This study investigates resource management schemes for cloud-native platforms running Docker and Kubernetes, focusing on containerized deep learning and big-data workloads. It builds a real-time monitoring system and conducts extensive single-node and cluster experiments with varied workloads and submission patterns to compare platform behavior. Key findings show substantial completion-time reductions with config changes (up to 79.4%) and notable inter-platform differences (up to 96.7%) due to resource-management policies, alongside occasional delays in resource release (up to 116.7%). The results guide developers and operators in selecting hosting platforms and configuring resource management to balance performance and utilization in cloud-native environments.
Abstract
Businesses have made increasing adoption and incorporation of cloud technology into internal processes in the last decade. The cloud-based deployment provides on-demand availability without active management. More recently, the concept of cloud-native application has been proposed and represents an invaluable step toward helping organizations develop software faster and update it more frequently to achieve dramatic business outcomes. Cloud-native is an approach to build and run applications that exploit the cloud computing delivery model's advantages. It is more about how applications are created and deployed than where. The container-based virtualization technology, such as Docker and Kubernetes, serves as the foundation for cloud-native applications. This paper investigates the performance of two popular computational-intensive applications, big data, and deep learning, in a cloud-native environment. We analyze the system overhead and resource usage for these applications. Through extensive experiments, we show that the completion time reduces by up to 79.4% by changing the default setting and increases by up to 96.7% due to different resource management schemes on two platforms. Additionally, the resource release is delayed by up to 116.7% across different systems. Our work can guide developers, administrators, and researchers to better design and deploy their applications by selecting and configuring a hosting platform.
