Table of Contents
Fetching ...

Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI

Yonas Atinafu, Henry Lin, Robin Cohen

Abstract

In this paper, we present a novel black-box online controller that uses only end-to-end measurements over short segments, without internal instrumentation, and hill climbing to maximize goodput, defined as the throughput of requests that satisfy the service-level objective. We provide empirical evidence that this design is well-founded. Using this advance in LLM serving as a concrete example, we then discuss the importance of integrating system performance and sustainability metrics into Factsheets for organizations adopting AI systems.

Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI

Abstract

In this paper, we present a novel black-box online controller that uses only end-to-end measurements over short segments, without internal instrumentation, and hill climbing to maximize goodput, defined as the throughput of requests that satisfy the service-level objective. We provide empirical evidence that this design is well-founded. Using this advance in LLM serving as a concrete example, we then discuss the importance of integrating system performance and sustainability metrics into Factsheets for organizations adopting AI systems.
Paper Structure (7 sections, 3 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 7 sections, 3 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: SLO-Tuner treats the server as a black box and tunes concurrency, batch size, and speculative width from tail latency and goodput.
  • Figure 2: Simulator goodput--p99 tradeoffs (steady vs. bursty).
  • Figure 3: Simulator hill-climb trajectories.
  • Figure 4: Simulator ablations over speculative width, verifier cadence, batch size.
  • Figure 5: vLLM experiments: SLO-Tuner trajectory and ablations over concurrency, batch size, and speculative width.
  • ...and 1 more figures