Software Performance Engineering for Foundation Model-Powered Software
Haoxiang Zhang, Shi Chang, Arthur Leung, Kishanthan Thangarajah, Boyuan Chen, Hanan Lutfiyya, Ahmed E. Hassan
TL;DR
FMware introduces production-level performance guarantees for foundation-model-powered software by treating performance engineering as a first-class concern. The authors identify four SPE challenges—cognitive-architecture design, token-efficient communication, tuning/optimization, and deployment—and propose the SLA-aware FMware Runtime with Profiler, Resource Provisioner, and Replica Router to enforce end-to-end SLAs using a latency estimate $l_{est} = l_x + l_q + l_r$. Their evaluation against Ray Serve on a heterogeneous cluster demonstrates substantial latency and goodput improvements, validating the approach and highlighting gains in resource utilization. The work advances practical, SLA-driven engineering for FMware and calls for benchmarks and tooling to broaden adoption across Promptware and Agentware projects.
Abstract
The rise of Foundation Models (FMs) like Large Language Models (LLMs) is revolutionizing software development. Despite the impressive prototypes, transforming FMware into production-ready products demands complex engineering across various domains. A critical but overlooked aspect is performance engineering, which aims at ensuring FMware meets performance goals such as throughput and latency to avoid user dissatisfaction and financial loss. Often, performance considerations are an afterthought, leading to costly optimization efforts post-deployment. FMware's high computational resource demands highlight the need for efficient hardware use. Continuous performance engineering is essential to prevent degradation. This paper highlights the significance of Software Performance Engineering (SPE) in FMware, identifying four key challenges: cognitive architecture design (i.e., the structural design that defines how AI components interact, reason, and interface with classical software components), communication protocols, tuning and optimization, and deployment. These challenges are based on literature surveys and experiences from developing an in-house FMware system. We discuss problems, current practices, and innovative paths for the software engineering community.
