Small Language Models for Application Interactions: A Case Study
Beibin Li, Yi Zhang, Sébastien Bubeck, Jeevan Pathuri, Ishai Menache
TL;DR
This paper evaluates Small Language Models (SLMs) for enabling natural-language interactions with a Microsoft internal cloud-supply-chain fulfilment application. It demonstrates that fine-tuned SLMs such as Phi-3, Llama 3, and Mistral v0.2 can achieve higher accuracy and faster responses than larger LLMs, using modest training data and enabling on-device deployment. The authors present a data-generation and fine-tuning pipeline, including templates, prompt evolution, and LoRA-based training to translate user queries into Python code that invokes internal APIs. The study highlights practical implications for edge-friendly enterprise NL interfaces and provides design considerations for managing in-domain/out-of-domain queries, task routing, and cost.
Abstract
We study the efficacy of Small Language Models (SLMs) in facilitating application usage through natural language interactions. Our focus here is on a particular internal application used in Microsoft for cloud supply chain fulfilment. Our experiments show that small models can outperform much larger ones in terms of both accuracy and running time, even when fine-tuned on small datasets. Alongside these results, we also highlight SLM-based system design considerations.
