Searching, fast and slow, through product catalogs
Dayananda Ubrangala, Juhi Sharma, Sharath Kumar Rangappa, Kiran R, Ravi Prasad Kondapalli, Laurent Boué
TL;DR
This paper tackles SKU search in CRM contexts by proposing a production-ready, multi-component architecture that handles abbreviations and scale. It integrates part-number pattern matching, dynamic Trie-based suggestions, and a high-accuracy complete search that fuses character-level TF-IDF with language-model embeddings, achieving strong top-10 accuracy while meeting a strict latency target. An ablation study demonstrates the complementary value of each component, and the work additionally explores GPT-3.5-Turbo-based SKU description generation to improve user context. The approach is demonstrated on a Dynamics CRM catalog with about $87{,}000$ SKUs and shows practical impact through improved search quality, deployment considerations, and clear avenues for future refinement such as semantic abbreviation embeddings and automatic abbreviation discovery.
Abstract
String matching algorithms in the presence of abbreviations, such as in Stock Keeping Unit (SKU) product catalogs, remains a relatively unexplored topic. In this paper, we present a unified architecture for SKU search that provides both a real-time suggestion system (based on a Trie data structure) as well as a lower latency search system (making use of character level TF-IDF in combination with language model vector embeddings) where users initiate the search process explicitly. We carry out ablation studies that justify designing a complex search system composed of multiple components to address the delicate trade-off between speed and accuracy. Using SKU search in the Dynamics CRM as an example, we show how our system vastly outperforms, in all aspects, the results provided by the default search engine. Finally, we show how SKU descriptions may be enhanced via generative text models (using gpt-3.5-turbo) so that the consumers of the search results may get more context and a generally better experience when presented with the results of their SKU search.
