Table of Contents
Fetching ...

Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce

Zhe Lin, Jiwei Tan, Dan Ou, Xi Chen, Shaowei Yao, Bo Zheng

TL;DR

This paper tackles the challenge of delivering fast and interpretable text relevance for Chinese e-commerce search by replacing dense embeddings with sparse Bag-of-Words representations. The DeepBoW model uses a two-tower architecture with multi-granularity encoders to produce term-weighting BoW and synonym-expansion BoW representations, enhanced by an N-gram hashing vocabulary to preserve semantic phrases. Relevance scoring is computed efficiently through a sparse word-weight sum, trained with cross-entropy and sparsity penalties, enabling offline pre-computation and online O(N) inference on CPU. Empirical results on industrial Taobao data show competitive offline performance and real-world online gains, including a 0.4% increase in transactions, with the model deployed in production and maintaining low latency. The work demonstrates that interpretable, sparse representations can achieve both practical efficiency and strong relevance in large-scale e-commerce settings.

Abstract

Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these models perform well on the offline test dataset, there are still obstacles to deploy the pre-trained language model to the online system as their high latency. The two-tower model is extensively employed in industrial scenarios, owing to its ability to harmonize performance with computational efficiency. Regrettably, such models present an opaque ``black box'' nature, which prevents developers from making special optimizations. In this paper, we raise deep Bag-of-Words (DeepBoW) model, an efficient and interpretable relevance architecture for Chinese e-commerce. Our approach proposes to encode the query and the product into the sparse BoW representation, which is a set of word-weight pairs. The weight means the important or the relevant score between the corresponding word and the raw text. The relevance score is measured by the accumulation of the matched word between the sparse BoW representation of the query and the product. Compared to popular dense distributed representation that usually suffers from the drawback of black-box, the most advantage of the proposed representation model is highly explainable and interventionable, which is a superior advantage to the deployment and operation of online search engines. Moreover, the online efficiency of the proposed model is even better than the most efficient inner product form of dense representation ...

Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce

TL;DR

This paper tackles the challenge of delivering fast and interpretable text relevance for Chinese e-commerce search by replacing dense embeddings with sparse Bag-of-Words representations. The DeepBoW model uses a two-tower architecture with multi-granularity encoders to produce term-weighting BoW and synonym-expansion BoW representations, enhanced by an N-gram hashing vocabulary to preserve semantic phrases. Relevance scoring is computed efficiently through a sparse word-weight sum, trained with cross-entropy and sparsity penalties, enabling offline pre-computation and online O(N) inference on CPU. Empirical results on industrial Taobao data show competitive offline performance and real-world online gains, including a 0.4% increase in transactions, with the model deployed in production and maintaining low latency. The work demonstrates that interpretable, sparse representations can achieve both practical efficiency and strong relevance in large-scale e-commerce settings.

Abstract

Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these models perform well on the offline test dataset, there are still obstacles to deploy the pre-trained language model to the online system as their high latency. The two-tower model is extensively employed in industrial scenarios, owing to its ability to harmonize performance with computational efficiency. Regrettably, such models present an opaque ``black box'' nature, which prevents developers from making special optimizations. In this paper, we raise deep Bag-of-Words (DeepBoW) model, an efficient and interpretable relevance architecture for Chinese e-commerce. Our approach proposes to encode the query and the product into the sparse BoW representation, which is a set of word-weight pairs. The weight means the important or the relevant score between the corresponding word and the raw text. The relevance score is measured by the accumulation of the matched word between the sparse BoW representation of the query and the product. Compared to popular dense distributed representation that usually suffers from the drawback of black-box, the most advantage of the proposed representation model is highly explainable and interventionable, which is a superior advantage to the deployment and operation of online search engines. Moreover, the online efficiency of the proposed model is even better than the most efficient inner product form of dense representation ...
Paper Structure (23 sections, 8 equations, 1 figure, 6 tables)

This paper contains 23 sections, 8 equations, 1 figure, 6 tables.

Figures (1)

  • Figure 1: An overview of the DeepBoW model. Figure (a) shows the architecture that encodes the input text into the Term-Weighting BoW representation, which gathers the attention weight of each word as its weight in the term-weighting BoW representation. Figure (b) shows the architecture that encodes the input text into the Synonym-Expansion BoW representation, which generates sparse BoW representation from character embedding and word embedding respectively, and aggregates these two representations as the synonym-expansion BoW representation.