Fine-grained large-scale content recommendations for MSX sellers
Manpreet Singh, Ravdeep Pasricha, Ravi Prasad Kondapalli, Kiran R, Nitish Singh, Akshita Agarwalla, Manoj R, Manish Prabhakar, Laurent Boué
TL;DR
This work tackles the challenge of surface-relevant content for each MSX opportunity by formulating a large-scale semantic matching pipeline that links opportunity context with Seismic metadata. It employs a two-stage retrieval architecture (bi-encoder candidate retrieval followed by cross-encoder re-ranking) with metadata-driven prompts and weekly content updates, designed to scale to ~$7\times 10^5$ opportunities and ~$4\times 10^4$ documents. The approach is evaluated through human expert judgments and LLM-based proxies, showing strong alignment between model scores and expert ratings (e.g., $r=0.78$, $\rho=0.64$) and feasible use of GPT-4 as a judge ($r=0.42$, $\rho=0.57$), while delivering practical runtime improvements via Pandas UDFs on Azure Databricks (≈$2\ \mathrm{s}$ to ≈$90\ \mathrm{ms}$ per opportunity on a $96$-vcore cluster). Integrated into MSX Copilot, the system provides sellers with top-5 customer-ready or private-content recommendations, enabling more targeted engagement and faster deal velocity; future work includes personalization and multi-modal content handling to further boost relevance and impact.
Abstract
One of the most critical tasks of Microsoft sellers is to meticulously track and nurture potential business opportunities through proactive engagement and tailored solutions. Recommender systems play a central role to help sellers achieve their goals. In this paper, we present a content recommendation model which surfaces various types of content (technical documentation, comparison with competitor products, customer success stories etc.) that sellers can share with their customers or use for their own self-learning. The model operates at the opportunity level which is the lowest possible granularity and the most relevant one for sellers. It is based on semantic matching between metadata from the contents and carefully selected attributes of the opportunities. Considering the volume of seller-managed opportunities in organizations such as Microsoft, we show how to perform efficient semantic matching over a very large number of opportunity-content combinations. The main challenge is to ensure that the top-5 relevant contents for each opportunity are recommended out of a total of $\approx 40,000$ published contents. We achieve this target through an extensive comparison of different model architectures and feature selection. Finally, we further examine the quality of the recommendations in a quantitative manner using a combination of human domain experts as well as by using the recently proposed "LLM as a judge" framework.
