Table of Contents
Fetching ...

Deep Learning Model Deployment in Multiple Cloud Providers: an Exploratory Study Using Low Computing Power Environments

Elayne Lemos, Rodrigo Oliveira, Jairson Rodrigues, Rosalvo F. Oliveira Neto

TL;DR

This study assesses the feasibility of deploying deep learning-based grammatical error correction in low-resource cloud environments across AWS, GCP, and Azure, focusing on CPU-only versus GPU-enabled configurations. By using the GECToR model and a fixed CoNLL-2014-derived corpus, the authors measure latency, resource usage, and cost under varying concurrency. They find that GPUs deliver faster latency but incur roughly triple the cost on average, while CPU-based deployments with larger processor caches can achieve acceptable latency (often under 2 seconds) and substantially lower costs (around 50% savings). The results demonstrate the viability of GPU-free cloud inference for PoCs and provide practical guidance for startups and research groups operating under tight budget constraints.

Abstract

The deployment of Machine Learning models in the cloud has grown among tech companies. Hardware requirements are higher when these models involve Deep Learning techniques, and the cloud providers' costs may be a barrier. We explore deploying Deep Learning models, using for experiments the GECToR model, a Deep Learning solution for Grammatical Error Correction, across three of the major cloud providers (Amazon Web Services, Google Cloud Platform, and Microsoft Azure). We evaluate real-time latency, hardware usage, and cost at each cloud provider in 7 execution environments with 10 experiments reproduced. We found that while Graphics Processing Units (GPUs) excel in performance, they had an average cost 300% higher than solutions without a GPU. Our analysis also suggests that processor cache memory size is a key variable for CPU-only deployments, and setups with sufficient cache achieved a 50% cost reduction compared to GPU-based deployments. This study indicates the feasibility and affordability of cloud-based Deep Learning inference solutions without a GPU, benefiting resource-constrained users such as startups and small research groups.

Deep Learning Model Deployment in Multiple Cloud Providers: an Exploratory Study Using Low Computing Power Environments

TL;DR

This study assesses the feasibility of deploying deep learning-based grammatical error correction in low-resource cloud environments across AWS, GCP, and Azure, focusing on CPU-only versus GPU-enabled configurations. By using the GECToR model and a fixed CoNLL-2014-derived corpus, the authors measure latency, resource usage, and cost under varying concurrency. They find that GPUs deliver faster latency but incur roughly triple the cost on average, while CPU-based deployments with larger processor caches can achieve acceptable latency (often under 2 seconds) and substantially lower costs (around 50% savings). The results demonstrate the viability of GPU-free cloud inference for PoCs and provide practical guidance for startups and research groups operating under tight budget constraints.

Abstract

The deployment of Machine Learning models in the cloud has grown among tech companies. Hardware requirements are higher when these models involve Deep Learning techniques, and the cloud providers' costs may be a barrier. We explore deploying Deep Learning models, using for experiments the GECToR model, a Deep Learning solution for Grammatical Error Correction, across three of the major cloud providers (Amazon Web Services, Google Cloud Platform, and Microsoft Azure). We evaluate real-time latency, hardware usage, and cost at each cloud provider in 7 execution environments with 10 experiments reproduced. We found that while Graphics Processing Units (GPUs) excel in performance, they had an average cost 300% higher than solutions without a GPU. Our analysis also suggests that processor cache memory size is a key variable for CPU-only deployments, and setups with sufficient cache achieved a 50% cost reduction compared to GPU-based deployments. This study indicates the feasibility and affordability of cloud-based Deep Learning inference solutions without a GPU, benefiting resource-constrained users such as startups and small research groups.

Paper Structure

This paper contains 16 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Inequality-adjusted Human Development Index by country.
  • Figure 2: Artificial Intelligence publications by country.
  • Figure 3: MLaaS solution layers: client, web service, and ML model
  • Figure 4: Computational application environments regarding the level of abstraction for the user. Source: RedHat2022.
  • Figure 5: Encoder-decoder neural network architecture. Source: Kostadinov2019.
  • ...and 2 more figures