Table of Contents
Fetching ...

TF2AIF: Facilitating development and deployment of accelerated AI models on the cloud-edge continuum

Aimilios Leftheriotis, Achilleas Tzenetopoulos, George Lentaris, Dimitrios Soudris, Georgios Theodoridis

TL;DR

TF2AIF tackles the challenge of heterogeneous cloud-edge AI deployment by automating the generation of multiple accelerated AI-inference variants from a high-level TensorFlow input. It offers a vendor-neutral Converter-Composer pipeline that produces platform-specific server and client containers with built-in quantization and pre/post-processing abstractions, enabling rapid deployment across CPU, ARM, FPGA, and GPU accelerators. In evaluations on a Kubernetes cluster, TF2AIF generated dozens of variants within minutes and demonstrated substantial latency speedups over native TensorFlow, validating its potential for design-space exploration and ML-driven scheduling on the cloud-edge continuum. The work enables researchers and operators to benchmark, compare, and orchestrate AI workloads across heterogeneous hardware with minimal expertise and time.

Abstract

The B5G/6G evolution relies on connect-compute technologies and highly heterogeneous clusters with HW accelerators, which require specialized coding to be efficiently utilized. The current paper proposes a custom tool for generating multiple SW versions of a certain AI function input in high-level language, e.g., Python TensorFlow, while targeting multiple diverse HW+SW platforms. TF2AIF builds upon disparate tool-flows to create a plethora of relative containers and enable the system orchestrator to deploy the requested function on any peculiar node in the cloud-edge continuum, i.e., to leverage the performance/energy benefits of the underlying HW upon any circumstances. TF2AIF fills an identified gap in today's ecosystem and facilitates research on resource management or automated operations, by demanding minimal time or expertise from users.

TF2AIF: Facilitating development and deployment of accelerated AI models on the cloud-edge continuum

TL;DR

TF2AIF tackles the challenge of heterogeneous cloud-edge AI deployment by automating the generation of multiple accelerated AI-inference variants from a high-level TensorFlow input. It offers a vendor-neutral Converter-Composer pipeline that produces platform-specific server and client containers with built-in quantization and pre/post-processing abstractions, enabling rapid deployment across CPU, ARM, FPGA, and GPU accelerators. In evaluations on a Kubernetes cluster, TF2AIF generated dozens of variants within minutes and demonstrated substantial latency speedups over native TensorFlow, validating its potential for design-space exploration and ML-driven scheduling on the cloud-edge continuum. The work enables researchers and operators to benchmark, compare, and orchestrate AI workloads across heterogeneous hardware with minimal expertise and time.

Abstract

The B5G/6G evolution relies on connect-compute technologies and highly heterogeneous clusters with HW accelerators, which require specialized coding to be efficiently utilized. The current paper proposes a custom tool for generating multiple SW versions of a certain AI function input in high-level language, e.g., Python TensorFlow, while targeting multiple diverse HW+SW platforms. TF2AIF builds upon disparate tool-flows to create a plethora of relative containers and enable the system orchestrator to deploy the requested function on any peculiar node in the cloud-edge continuum, i.e., to leverage the performance/energy benefits of the underlying HW upon any circumstances. TF2AIF fills an identified gap in today's ecosystem and facilitates research on resource management or automated operations, by demanding minimal time or expertise from users.
Paper Structure (13 sections, 5 figures, 3 tables)

This paper contains 13 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Automated model-variant generation
  • Figure 2: Example of AGX implementation
  • Figure 3: AI service variants generation time; model conversion and final image creation.
  • Figure 4: Boxplot of the latency (ms) of each request for each AI-framework-platform model variant
  • Figure 5: Comparison of average latency (ms) between the selected AI frameworks, and the native TensorFlow implementations.