From Reasoning LLMs to BERT: A Two-Stage Distillation Framework for Search Relevance
Runze Xia, Yupeng Ji, Yuxi Zhou, Haodong Liu, Teng Zhang, Piji Li
TL;DR
The paper addresses the latency gap of using large reasoning models for production e-commerce search by proposing a two-stage Reasoning-then-Distilling framework. It first builds a domain-adapted reasoning LLM through CPT, SFT, and a multi-dimensional reward-driven preference model, then distills its reasoning into a lightweight 6-layer BERT via Contrastive Reasoning Self-Distillation (CRSD), which aligns representations using an InfoNCE objective. Offline experiments show substantial gains in relevance metrics (e.g., Macro F1 up to 0.7174) and high retention of teacher capabilities (≈98.6%), while online A/B tests in Meituan demonstrate improved Ad CTR, CVR, and GTV. The approach provides a practical, deployment-friendly path to infuse LLM-style reasoning into real-world, latency-constrained search systems, with robust ablations validating the importance of reasoning paths. The work signals a scalable route to combine interpretability, accuracy, and efficiency in industrial IR contexts.
Abstract
Query-service relevance prediction in e-commerce search systems faces strict latency requirements that prevent the direct application of Large Language Models (LLMs). To bridge this gap, we propose a two-stage reasoning distillation framework to transfer reasoning capabilities from a powerful teacher LLM to a lightweight, deployment-friendly student model. In the first stage, we address the limitations of general-purpose LLMs by constructing a domain-adapted teacher model. This is achieved through a three-step process: domain-adaptive pre-training to inject platform knowledge, supervised fine-tuning to elicit reasoning skills, and preference optimization with a multi-dimensional reward model to ensure the generation of reliable and preference-aligned reasoning paths. This teacher can then automatically annotate massive query-service pairs from search logs with both relevance labels and reasoning chains. In the second stage, to address the challenges of architectural heterogeneity in standard distillation, we introduce Contrastive Reasoning Self-Distillation (CRSD). By modeling the behavior of the same student model under "standard" and "reasoning-augmented" inputs as a teacher-student relationship, CRSD enables the lightweight model to internalize the teacher's complex decision-making mechanisms without needing the explicit reasoning path at inference. Offline evaluations and online A/B testing in the Meituan search advertising system demonstrate that our framework achieves significant improvements across multiple metrics, validating its effectiveness and practical value.
