LocalGPT: Benchmarking and Advancing Large Language Models for Local Life Services in Meituan
Xiaochong Lan, Jie Feng, Jiahuan Lei, Xinlei Shi, Yong Li
TL;DR
The paper tackles the challenge of applying large language models to local life services, where domain shift and task diversity impede zero-shot generalization. It introduces LocalEval, a 41-task benchmark across four categories, and a multi-agent instruction synthesis pipeline (LocalInstruction) to produce high-quality fine-tuning data, complemented by agentic workflows for complex composite tasks. The experiments show that a 7B parameter model, when fine-tuned with LocalInstruction, can reach parity with a 72B model on LocalEval tasks, enabling cost-effective deployment; cross-city analyses reveal the importance of diverse domain data for transferability. Real-world deployments on Meituan demonstrate tangible business benefits in recommendation, search, and review ranking, validating the approach's practical value and scalability for local life platforms.
Abstract
Large language models (LLMs) have exhibited remarkable capabilities and achieved significant breakthroughs across various domains, leading to their widespread adoption in recent years. Building on this progress, we investigate their potential in the realm of local life services. In this study, we establish a comprehensive benchmark and systematically evaluate the performance of diverse LLMs across a wide range of tasks relevant to local life services. To further enhance their effectiveness, we explore two key approaches: model fine-tuning and agent-based workflows. Our findings reveal that even a relatively compact 7B model can attain performance levels comparable to a much larger 72B model, effectively balancing inference cost and model capability. This optimization greatly enhances the feasibility and efficiency of deploying LLMs in real-world online services, making them more practical and accessible for local life applications.
