Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production

Chandra Irugalbandara; Ashish Mahendra; Roland Daynauth; Tharuka Kasthuri Arachchige; Jayanaka Dantanarayana; Krisztian Flautner; Lingjia Tang; Yiping Kang; Jason Mars

Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production

Chandra Irugalbandara, Ashish Mahendra, Roland Daynauth, Tharuka Kasthuri Arachchige, Jayanaka Dantanarayana, Krisztian Flautner, Lingjia Tang, Yiping Kang, Jason Mars

TL;DR

The paper addresses the problem of whether open-source small language models (SLMs) can replace proprietary LLMs like OpenAI's GPT-4 in production. It introduces SLaM, an open-source framework for automated hosting, evaluation, and cost-performance analysis of SLMs versus GPT-4, and applies it to a real production feature—the Daily Pep Talk in myca.ai—across 9 SLMs (29 variants). The findings show that several SLMs achieve near-GPT-4 quality, with significantly more predictable latency and cost reductions in the range of $5\times$ to $29\times$, demonstrating practical viability for production use. This work provides a concrete methodology and tooling to systematically evaluate SLM readiness, offering a path toward cost-effective, reliable, self-hosted AI features in industry settings.

Abstract

Many companies use large language models (LLMs) offered as a service, like OpenAI's GPT-4, to create AI-enabled product experiences. Along with the benefits of ease-of-use and shortened time-to-solution, this reliance on proprietary services has downsides in model control, performance reliability, uptime predictability, and cost. At the same time, a flurry of open-source small language models (SLMs) has been made available for commercial use. However, their readiness to replace existing capabilities remains unclear, and a systematic approach to holistically evaluate these SLMs is not readily available. This paper presents a systematic evaluation methodology and a characterization of modern open-source SLMs and their trade-offs when replacing proprietary LLMs for a real-world product feature. We have designed SLaM, an open-source automated analysis tool that enables the quantitative and qualitative testing of product features utilizing arbitrary SLMs. Using SLaM, we examine the quality and performance characteristics of modern SLMs relative to an existing customer-facing implementation using the OpenAI GPT-4 API. Across 9 SLMs and their 29 variants, we observe that SLMs provide competitive results, significant performance consistency improvements, and a cost reduction of 5x~29x when compared to GPT-4.

Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production

TL;DR

, demonstrating practical viability for production use. This work provides a concrete methodology and tooling to systematically evaluate SLM readiness, offering a path toward cost-effective, reliable, self-hosted AI features in industry settings.

Abstract

Paper Structure (29 sections, 17 figures, 1 table)

This paper contains 29 sections, 17 figures, 1 table.

Introduction
Background: The Recent Evolution of LLMs
Large Language Models
Impact of LLMs
Proprietary LLMs and OpenAI APIs
Open-source Small Language Models
Developing with LLMs
Problem: Realizing the "Daily Pep Talk" Feature
Product Feature Case Study
Challenges with OpenAI APIs
Replacing OpenAI with SLMs
SLAM Methodology and Tool
SLaM Architecture and Components
SLaM Response Quality Evaluation Methodology
Human Evaluation
...and 14 more sections

Figures (17)

Figure 1: Brief history of evolution of language models and recent surge in open-source SLMs.
Figure 2: OpenAI APIs status, captured on 12/14/23
Figure 3: SLaM Tool UI Interface
Figure 4: Architecture Overview of the SLaM Tool
Figure 5: Human Evaluation
...and 12 more figures

Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production

TL;DR

Abstract

Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing OpenAI's LLM with Open Source SLMs in Production

Authors

TL;DR

Abstract

Table of Contents

Figures (17)