Table of Contents
Fetching ...

LocalSUG: Geography-Aware LLM for Query Suggestion in Local-Life Services

Jinwen Chen, Shuai Gong, Shiwen Zhang, Zheng Zhang, Yachao Zhao, Lingxiang Wang, Haibo Zhou, Yuan Zhan, Wei Lin, Hainan Zhang

TL;DR

This work proposes LocalSUG, an LLM-based query suggestion framework tailored for local-life service platforms, and introduces a city-aware candidate mining strategy based on term co-occurrence to inject geographic grounding into generation and develops quality-aware beam acceleration and vocabulary pruning techniques that significantly reduce online latency while preserving generation quality.

Abstract

In local-life service platforms, the query suggestion module plays a crucial role in enhancing user experience by generating candidate queries based on user input prefixes, thus reducing user effort and accelerating search. Traditional multi-stage cascading systems rely heavily on historical top queries, limiting their ability to address long-tail demand. While LLMs offer strong semantic generalization, deploying them in local-life services introduces three key challenges: lack of geographic grounding, exposure bias in preference optimization, and online inference latency. To address these issues, we propose LocalSUG, an LLM-based query suggestion framework tailored for local-life service platforms. First, we introduce a city-aware candidate mining strategy based on term co-occurrence to inject geographic grounding into generation. Second, we propose a beam-search-driven GRPO algorithm that aligns training with inference-time decoding, reducing exposure bias in autoregressive generation. A multi-objective reward mechanism further optimizes both relevance and business-oriented metrics. Finally, we develop quality-aware beam acceleration and vocabulary pruning techniques that significantly reduce online latency while preserving generation quality. Extensive offline evaluations and large-scale online A/B testing demonstrate that LocalSUG improves click-through rate (CTR) by +0.35% and reduces the low/no-result rate by 2.56%, validating its effectiveness in real-world deployment.

LocalSUG: Geography-Aware LLM for Query Suggestion in Local-Life Services

TL;DR

This work proposes LocalSUG, an LLM-based query suggestion framework tailored for local-life service platforms, and introduces a city-aware candidate mining strategy based on term co-occurrence to inject geographic grounding into generation and develops quality-aware beam acceleration and vocabulary pruning techniques that significantly reduce online latency while preserving generation quality.

Abstract

In local-life service platforms, the query suggestion module plays a crucial role in enhancing user experience by generating candidate queries based on user input prefixes, thus reducing user effort and accelerating search. Traditional multi-stage cascading systems rely heavily on historical top queries, limiting their ability to address long-tail demand. While LLMs offer strong semantic generalization, deploying them in local-life services introduces three key challenges: lack of geographic grounding, exposure bias in preference optimization, and online inference latency. To address these issues, we propose LocalSUG, an LLM-based query suggestion framework tailored for local-life service platforms. First, we introduce a city-aware candidate mining strategy based on term co-occurrence to inject geographic grounding into generation. Second, we propose a beam-search-driven GRPO algorithm that aligns training with inference-time decoding, reducing exposure bias in autoregressive generation. A multi-objective reward mechanism further optimizes both relevance and business-oriented metrics. Finally, we develop quality-aware beam acceleration and vocabulary pruning techniques that significantly reduce online latency while preserving generation quality. Extensive offline evaluations and large-scale online A/B testing demonstrate that LocalSUG improves click-through rate (CTR) by +0.35% and reduces the low/no-result rate by 2.56%, validating its effectiveness in real-world deployment.
Paper Structure (31 sections, 6 equations, 5 figures, 6 tables, 2 algorithms)

This paper contains 31 sections, 6 equations, 5 figures, 6 tables, 2 algorithms.

Figures (5)

  • Figure 1: Examples of query suggestion in Local Life Service platforms.
  • Figure 2: Overview of our proposed generative Query Suggestion framework. (a) Candidate Mining and Multi-Source Input: Prefix-to-query candidates are retrieved via city-aware and global co-occurrence rules, then concatenated with hot words, user history, and profiles. (b) Beam-Search-Driven GRPO Training: The model is optimized using group relative rewards to bridge the gap between training and inference, incorporating hit, rank, and format objectives. (c) Beam Search Inference Acceleration: Online performance is enhanced through Quality-Aware Accelerated Beam Search and LLM head pruning.
  • Figure 3: Impact of (a) QA-BS and (b) vocabulary pruning on inference latency and model performance.
  • Figure 4: Performance stability across different GRPO group sizes ($G$).
  • Figure 5: Sensitivity analysis of retrieval candidates and beam search width on efficiency and effectiveness.