TF2AIF: Facilitating development and deployment of accelerated AI models on the cloud-edge continuum
Aimilios Leftheriotis, Achilleas Tzenetopoulos, George Lentaris, Dimitrios Soudris, Georgios Theodoridis
TL;DR
TF2AIF tackles the challenge of heterogeneous cloud-edge AI deployment by automating the generation of multiple accelerated AI-inference variants from a high-level TensorFlow input. It offers a vendor-neutral Converter-Composer pipeline that produces platform-specific server and client containers with built-in quantization and pre/post-processing abstractions, enabling rapid deployment across CPU, ARM, FPGA, and GPU accelerators. In evaluations on a Kubernetes cluster, TF2AIF generated dozens of variants within minutes and demonstrated substantial latency speedups over native TensorFlow, validating its potential for design-space exploration and ML-driven scheduling on the cloud-edge continuum. The work enables researchers and operators to benchmark, compare, and orchestrate AI workloads across heterogeneous hardware with minimal expertise and time.
Abstract
The B5G/6G evolution relies on connect-compute technologies and highly heterogeneous clusters with HW accelerators, which require specialized coding to be efficiently utilized. The current paper proposes a custom tool for generating multiple SW versions of a certain AI function input in high-level language, e.g., Python TensorFlow, while targeting multiple diverse HW+SW platforms. TF2AIF builds upon disparate tool-flows to create a plethora of relative containers and enable the system orchestrator to deploy the requested function on any peculiar node in the cloud-edge continuum, i.e., to leverage the performance/energy benefits of the underlying HW upon any circumstances. TF2AIF fills an identified gap in today's ecosystem and facilitates research on resource management or automated operations, by demanding minimal time or expertise from users.
