A case study of Generative AI in MSX Sales Copilot: Improving seller productivity with a real-time question-answering system for content recommendation

Manpreet Singh; Ravdeep Pasricha; Nitish Singh; Ravi Prasad Kondapalli; Manoj R; Kiran R; Laurent Boué

A case study of Generative AI in MSX Sales Copilot: Improving seller productivity with a real-time question-answering system for content recommendation

Manpreet Singh, Ravdeep Pasricha, Nitish Singh, Ravi Prasad Kondapalli, Manoj R, Kiran R, Laurent Boué

TL;DR

The paper tackles real-time, content-recommendation for sellers by leveraging metadata-driven prompts to surface Seismic documents within MSX Copilot. It proposes a two-stage, unsupervised LLM-based architecture: a fast bi-encoder retrieval over precomputed metadata prompts followed by a cross-encoder re-ranking to yield the top-5 results, all running via an AML endpoint integrated with Semantic Kernel. Evaluation combines latency measurements on multiple VM types with human relevancy ratings, and ablation studies demonstrate the superiority of the full two-stage approach and the value of incorporating numerical features. The work demonstrates production-ready deployment within MSX Copilot, delivering real-time recommendations and paving the way for personalized, context-aware content assistance with future plans to incorporate actual document content and richer seller context.

Abstract

In this paper, we design a real-time question-answering system specifically targeted for helping sellers get relevant material/documentation they can share live with their customers or refer to during a call. Taking the Seismic content repository as a relatively large scale example of a diverse dataset of sales material, we demonstrate how LLM embeddings of sellers' queries can be matched with the relevant content. We achieve this by engineering prompts in an elaborate fashion that makes use of the rich set of meta-features available for documents and sellers. Using a bi-encoder with cross-encoder re-ranker architecture, we show how the solution returns the most relevant content recommendations in just a few seconds even for large datasets. Our recommender system is deployed as an AML endpoint for real-time inferencing and has been integrated into a Copilot interface that is now deployed in the production version of the Dynamics CRM, known as MSX, used daily by Microsoft sellers.

A case study of Generative AI in MSX Sales Copilot: Improving seller productivity with a real-time question-answering system for content recommendation

TL;DR

Abstract

Paper Structure (17 sections, 5 equations, 5 figures, 2 tables)

This paper contains 17 sections, 5 equations, 5 figures, 2 tables.

Introduction
Related work
Metadata prompt engineering
Architecture of the LLM-based recommender model
Bi-encoder document retrieval
Cross-encoder re-ranking
Performance evaluation
Inference latency
Human annotators for quantitative relevancy evaluation
Ablation study
Architecture
Numerical features prompt engineering
Integration into Copilot
Technical details
How it looks
...and 2 more sections

Figures (5)

Figure 1: Illustration of the overall 2-stage architecture described in Section \ref{['sec:architecture']}.
Figure 2: Illustration of the bi-encoder retrieval phase where the embeddings of the metadata prompts $\mathcal{P}$, pre-computed offline, are compared to the input query $q$ via cosine similarity. Note how the two branches operate independently of each other and only connect after the MSMarco DistillBERT embeddings.
Figure 3: Illustration of re-ranking phase using CrossEncoder where pair of input query text and prompt text are passed to MSMarco MiniLM model which produces a score based on similarity between the pair.
Figure 4: Median latency in milliseconds of different machines on the evaluation queries w.r.t. to batch size $b$. The latency times represent the overall latency which includes the time taken by bi-encoder to encode the user query to get the embeddings, computing cosine similarity against pre-computed document embeddings, shortlisting the top-100 candidate documents, passing them on to the cross-encoder to re-rank and finally return top-5 documents as described in Section \ref{['sec:architecture']} and Fig.\ref{['fig:inference']} (Results for Standard F4s V2 are omitted for the sake of clarity but the actual numbers are reported in Table \ref{['table:inference_latency']}.)
Figure 5: Example of the current UI integration of our model into the production MSX Copilot.

A case study of Generative AI in MSX Sales Copilot: Improving seller productivity with a real-time question-answering system for content recommendation

TL;DR

Abstract

A case study of Generative AI in MSX Sales Copilot: Improving seller productivity with a real-time question-answering system for content recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)