Table of Contents
Fetching ...

Personalized Query Auto-Completion for Long and Short-Term Interests with Adaptive Detoxification Generation

Zhibo Wang, Xiaoze Jiang, Zhiheng Qin, Enyun Yu, Han Li

TL;DR

This work tackles two real-world gaps in query auto-completion: the need for hierarchical personalization that jointly leverages long-term and short-term user interests, and the requirement for adaptive detoxification to prevent toxic outputs on noisy prefixes. The authors propose LaD, a model that encodes long-term and short-term user signals in a hierarchical framework and uses a Detoxification Expert with a [Reject] token and Reject Preference Optimization to enforce non-toxic, prefix-relevant generations under tight latency. Through KSQAC experiments and online A/B tests, LaD achieves strong detoxification performance while maintaining competitive generation quality, and delivers substantial gains in CTR, engagement, and retention on Kuaishou, where it is deployed at scale. The approach combines a Long-Short Interests Hierarchical Capturing scheme with end-to-end detoxification, demonstrating practical viability for real-time industrial QAC systems and offering a path toward safer, personalized search experiences at scale.

Abstract

Query auto-completion (QAC) plays a crucial role in modern search systems. However, in real-world applications, there are two pressing challenges that still need to be addressed. First, there is a need for hierarchical personalized representations for users. Previous approaches have typically used users' search behavior as a single, overall representation, which proves inadequate in more nuanced generative scenarios. Additionally, query prefixes are typically short and may contain typos or sensitive information, increasing the likelihood of generating toxic content compared to traditional text generation tasks. Such toxic content can degrade user experience and lead to public relations issues. Therefore, the second critical challenge is detoxifying QAC systems. To address these two limitations, we propose a novel model (LaD) that captures personalized information from both long-term and short-term interests, incorporating adaptive detoxification. In LaD, personalized information is captured hierarchically at both coarse-grained and fine-grained levels. This approach preserves as much personalized information as possible while enabling online generation within time constraints. To move a futher step, we propose an online training method based on Reject Preference Optimization (RPO). By incorporating a special token [Reject] during both the training and inference processes, the model achieves adaptive detoxification. Consequently, the generated text presented to users is both non-toxic and relevant to the given prefix. We conduct comprehensive experiments on industrial-scale datasets and perform online A/B tests, delivering the largest single-experiment metric improvement in nearly two years of our product. Our model has been deployed on Kuaishou search, driving the primary traffic for hundreds of millions of active users. The code is available at https://github.com/JXZe/LaD.

Personalized Query Auto-Completion for Long and Short-Term Interests with Adaptive Detoxification Generation

TL;DR

This work tackles two real-world gaps in query auto-completion: the need for hierarchical personalization that jointly leverages long-term and short-term user interests, and the requirement for adaptive detoxification to prevent toxic outputs on noisy prefixes. The authors propose LaD, a model that encodes long-term and short-term user signals in a hierarchical framework and uses a Detoxification Expert with a [Reject] token and Reject Preference Optimization to enforce non-toxic, prefix-relevant generations under tight latency. Through KSQAC experiments and online A/B tests, LaD achieves strong detoxification performance while maintaining competitive generation quality, and delivers substantial gains in CTR, engagement, and retention on Kuaishou, where it is deployed at scale. The approach combines a Long-Short Interests Hierarchical Capturing scheme with end-to-end detoxification, demonstrating practical viability for real-time industrial QAC systems and offering a path toward safer, personalized search experiences at scale.

Abstract

Query auto-completion (QAC) plays a crucial role in modern search systems. However, in real-world applications, there are two pressing challenges that still need to be addressed. First, there is a need for hierarchical personalized representations for users. Previous approaches have typically used users' search behavior as a single, overall representation, which proves inadequate in more nuanced generative scenarios. Additionally, query prefixes are typically short and may contain typos or sensitive information, increasing the likelihood of generating toxic content compared to traditional text generation tasks. Such toxic content can degrade user experience and lead to public relations issues. Therefore, the second critical challenge is detoxifying QAC systems. To address these two limitations, we propose a novel model (LaD) that captures personalized information from both long-term and short-term interests, incorporating adaptive detoxification. In LaD, personalized information is captured hierarchically at both coarse-grained and fine-grained levels. This approach preserves as much personalized information as possible while enabling online generation within time constraints. To move a futher step, we propose an online training method based on Reject Preference Optimization (RPO). By incorporating a special token [Reject] during both the training and inference processes, the model achieves adaptive detoxification. Consequently, the generated text presented to users is both non-toxic and relevant to the given prefix. We conduct comprehensive experiments on industrial-scale datasets and perform online A/B tests, delivering the largest single-experiment metric improvement in nearly two years of our product. Our model has been deployed on Kuaishou search, driving the primary traffic for hundreds of millions of active users. The code is available at https://github.com/JXZe/LaD.

Paper Structure

This paper contains 16 sections, 9 equations, 6 figures, 9 tables, 1 algorithm.

Figures (6)

  • Figure 1: A schematic illustration of our method LaD. (a) Given the user's behavior and the prefix, our model can generate completions from the long- (women) and short-term (hair) interests respectively; (b) Under the toxic prefix (fruit strwab and fuck), our model can reply with non-toxic contents adaptively.
  • Figure 2: Overall structure of the LaD, where " [R]" denotes the special token [Reject], "LTE" denotes Long-term interests Transformer Encoder and "GLM" denotes Generative Language Model. The model primarily consists of two components: Long-Short Interests Hierarchical Capturing and Adaptive Detoxification.
  • Figure 3: An illustration of online generation.
  • Figure 4: Case study of our full model LaD. "Hu Tao" is a character in Genshin Impact.
  • Figure 5: Case study of our full model LaD. "Roseanne Park" is a Korean-New Zealand singer and dancer, and is a member of the South Korean girl group BLACKPINK.
  • ...and 1 more figures