Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

Yao Lu; Song Bian; Lequn Chen; Yongjun He; Yulong Hui; Matthew Lentz; Beibin Li; Fei Liu; Jialin Li; Qi Liu; Rui Liu; Xiaoxuan Liu; Lin Ma; Kexin Rong; Jianguo Wang; Yingjun Wu; Yongji Wu; Huanchen Zhang; Minjia Zhang; Qizhen Zhang; Tianyi Zhou; Danyang Zhuo

Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

Yao Lu, Song Bian, Lequn Chen, Yongjun He, Yulong Hui, Matthew Lentz, Beibin Li, Fei Liu, Jialin Li, Qi Liu, Rui Liu, Xiaoxuan Liu, Lin Ma, Kexin Rong, Jianguo Wang, Yingjun Wu, Yongji Wu, Huanchen Zhang, Minjia Zhang, Qizhen Zhang, Tianyi Zhou, Danyang Zhuo

TL;DR

The paper investigates how cloud-native computing concepts can be extended to support large generative models, proposing an AI-native computing paradigm that fuses containerization, multi-tenancy, and serverless infrastructure with advanced ML runtimes such as batched LoRA inference to improve COGS and resource accessibility. It draws analogies between LMaaS and DBaaS, discusses RAG-as-a-Service versus BI-as-a-Service, and outlines a research agenda for integrating ML workloads more tightly with cloud-native platforms. Three preliminary case studies—elastic resource scheduling, multi-tenant batched LoRA inference, and hybrid-cloud deployment optimization—illustrate potential efficiency and cost benefits while highlighting the substantial work still required. The paper also surveys related work and outlines open challenges across runtime efficiency, continuous learning, availability, and heterogeneous infrastructures, advocating deeper co-design between ML systems and cloud-native technologies.

Abstract

In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand for high-end GPUs. Drawing analogies between large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we describe an AI-native computing paradigm that harnesses the power of both cloud-native technologies (e.g., multi-tenancy and serverless computing) and advanced machine learning runtime (e.g., batched LoRA inference). These joint efforts aim to optimize costs-of-goods-sold (COGS) and improve resource accessibility. The journey of merging these two domains is just at the beginning and we hope to stimulate future research and development in this area.

Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

TL;DR

Abstract

Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

Authors

TL;DR

Abstract

Table of Contents

Figures (2)