Personalized Query Auto-Completion for Long and Short-Term Interests with Adaptive Detoxification Generation
Zhibo Wang, Xiaoze Jiang, Zhiheng Qin, Enyun Yu, Han Li
TL;DR
This work tackles two real-world gaps in query auto-completion: the need for hierarchical personalization that jointly leverages long-term and short-term user interests, and the requirement for adaptive detoxification to prevent toxic outputs on noisy prefixes. The authors propose LaD, a model that encodes long-term and short-term user signals in a hierarchical framework and uses a Detoxification Expert with a [Reject] token and Reject Preference Optimization to enforce non-toxic, prefix-relevant generations under tight latency. Through KSQAC experiments and online A/B tests, LaD achieves strong detoxification performance while maintaining competitive generation quality, and delivers substantial gains in CTR, engagement, and retention on Kuaishou, where it is deployed at scale. The approach combines a Long-Short Interests Hierarchical Capturing scheme with end-to-end detoxification, demonstrating practical viability for real-time industrial QAC systems and offering a path toward safer, personalized search experiences at scale.
Abstract
Query auto-completion (QAC) plays a crucial role in modern search systems. However, in real-world applications, there are two pressing challenges that still need to be addressed. First, there is a need for hierarchical personalized representations for users. Previous approaches have typically used users' search behavior as a single, overall representation, which proves inadequate in more nuanced generative scenarios. Additionally, query prefixes are typically short and may contain typos or sensitive information, increasing the likelihood of generating toxic content compared to traditional text generation tasks. Such toxic content can degrade user experience and lead to public relations issues. Therefore, the second critical challenge is detoxifying QAC systems. To address these two limitations, we propose a novel model (LaD) that captures personalized information from both long-term and short-term interests, incorporating adaptive detoxification. In LaD, personalized information is captured hierarchically at both coarse-grained and fine-grained levels. This approach preserves as much personalized information as possible while enabling online generation within time constraints. To move a futher step, we propose an online training method based on Reject Preference Optimization (RPO). By incorporating a special token [Reject] during both the training and inference processes, the model achieves adaptive detoxification. Consequently, the generated text presented to users is both non-toxic and relevant to the given prefix. We conduct comprehensive experiments on industrial-scale datasets and perform online A/B tests, delivering the largest single-experiment metric improvement in nearly two years of our product. Our model has been deployed on Kuaishou search, driving the primary traffic for hundreds of millions of active users. The code is available at https://github.com/JXZe/LaD.
