Table of Contents
Fetching ...

Data analysis of cloud virtualization experiments

Pedro R. X. do Carmo, Eduardo Freitas, Assis T. de Oliveira Filho, Judith Kelner, Djamel Sadok

TL;DR

A dataset of active network measurements data collected while varying various network parameters is presented, demonstrating its use in developing machine learning-based systems for administrator decision-making and its impact on a key network metric, namely, end-to-end latency.

Abstract

The cloud computing paradigm underlines data center and telecommunication infrastructure design. Heavily leveraging virtualization, it slices hardware and software resources into smaller software units for greater flexibility of manipulation. Given the considerable benefits, several virtualization forms, with varying processing and communication overheads, emerged, including Full Virtualization and OS Virtualization. As a result, predicting packet throughput at the data plane turns out to be more challenging due to the additional virtualization overhead located at CPU, I/O, and network resources. This research presents a dataset of active network measurements data collected while varying various network parameters, including CPU affinity, frequency of echo packet injection, type of virtual network driver, use of CPU, I/O, or network load, and the number of concurrent VMs. The virtualization technologies used in the study include KVM, LXC, and Docker. The work examines their impact on a key network metric, namely, end-to-end latency. Also, it builds data models to evaluate the impact of a cloud computing environment on packet round-trip time. To explore data visualization, the dataset was submitted to pre-processing, correlation analysis, dimensionality reduction, and clustering. In addition, this paper provides a brief analysis of the dataset, demonstrating its use in developing machine learning-based systems for administrator decision-making.

Data analysis of cloud virtualization experiments

TL;DR

A dataset of active network measurements data collected while varying various network parameters is presented, demonstrating its use in developing machine learning-based systems for administrator decision-making and its impact on a key network metric, namely, end-to-end latency.

Abstract

The cloud computing paradigm underlines data center and telecommunication infrastructure design. Heavily leveraging virtualization, it slices hardware and software resources into smaller software units for greater flexibility of manipulation. Given the considerable benefits, several virtualization forms, with varying processing and communication overheads, emerged, including Full Virtualization and OS Virtualization. As a result, predicting packet throughput at the data plane turns out to be more challenging due to the additional virtualization overhead located at CPU, I/O, and network resources. This research presents a dataset of active network measurements data collected while varying various network parameters, including CPU affinity, frequency of echo packet injection, type of virtual network driver, use of CPU, I/O, or network load, and the number of concurrent VMs. The virtualization technologies used in the study include KVM, LXC, and Docker. The work examines their impact on a key network metric, namely, end-to-end latency. Also, it builds data models to evaluate the impact of a cloud computing environment on packet round-trip time. To explore data visualization, the dataset was submitted to pre-processing, correlation analysis, dimensionality reduction, and clustering. In addition, this paper provides a brief analysis of the dataset, demonstrating its use in developing machine learning-based systems for administrator decision-making.
Paper Structure (16 sections, 1 equation, 5 figures, 8 tables)

This paper contains 16 sections, 1 equation, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Testbed. Adapted from FILHO202273.
  • Figure 2: Data processing steps
  • Figure 3: Correlation
  • Figure 4: Principal Component Analysis Plot. The X-axis represents the first principal component, and the Y-axis represents the second principal component. The data points represent the samples being analyzed. The vectors indicate the direction of maximum variance. Each color represents an analyzed technology: Docker, KVM, and LXC.
  • Figure 5: K-Means Clustering Plot. The X-axis and Y-axis show the values of two principal components, and each data point is positioned according to its values on these two components. Data points are grouped into 3 clusters represented by different colors. The centroids of the 3 clusters are represented by a wide dot at each cluster's center.