LLMDistill4Ads: Using Cross-Encoders to Distill from LLM Signals for Advertiser Keyphrase Recommendations
Soumik Dey, Benjamin Braun, Naveen Ravipati, Hansi Wu, Binbin Li
TL;DR
This work tackles the misalignment and bias in advertiser keyphrase recommendations by leveraging a multi-task distillation pipeline that moves knowledge from an LLM-based teacher to a cross-encoder assistant and finally to a lightweight bi-encoder student. By combining CTR, Search Relevance, and LLM-generated labels within a multi-dataset training framework, and by employing distillation losses such as Pearson correlation, the approach achieves improved retrieval quality while maintaining practical latency via Matryoshka embeddings. Offline ablations and an in-production A/B test demonstrate significant business impact, including substantial gains in GMB and ROAS, and higher seller adoption of keyphrases. The proposed evaluation protocol integrates de-duplication, relevance filtering, and LLM-based judgment to approximate real-world performance in a two-sided marketplace, offering a scalable path for production-ready advertiser keyphrase retrieval systems.
Abstract
E-commerce sellers are advised to bid on keyphrases to boost their advertising campaigns. These keyphrases must be relevant to prevent irrelevant items from cluttering search systems and to maintain positive seller perception. It is vital that keyphrase suggestions align with seller, search and buyer judgments. Given the challenges in collecting negative feedback in these systems, LLMs have been used as a scalable proxy to human judgments. This paper presents an empirical study on a major ecommerce platform of a distillation framework involving an LLM teacher, a cross-encoder assistant and a bi-encoder Embedding Based Retrieval (EBR) student model, aimed at mitigating click-induced biases in keyphrase recommendations.
