Handling Large-scale Cardinality in building recommendation systems
Dhruva Dixith Kurra, Bo Ling, Chun Zh, Seyedshahin Ashrafzadeh
TL;DR
The paper tackles the scalability of high-cardinality UUID features in the retrieval phase of large-scale recommenders by introducing a Bag-of-Words (BoW) proxy for $eater extunderscore uuid$ and a layer-sharing scheme between the two-tower embedding components. These methods substantially reduce embedding size (approximately 25x) and accelerate training, while boosting recall—evidenced by Recall@500 improvements and faster convergence (e.g., ~2000 steps to threshold vs ~20000 for the baseline BoW). The approach leverages historical user behavior via store proxies ($store extunderscore uuids$) and enables information exchange across towers to capture nuanced user-item relations. Supported by offline and online Uber Eats experiments, the work demonstrates practical gains in retrieval stage efficiency and performance, with statistical significance ($p$-value < 5%), suggesting broad applicability to high-cardinality features in large-scale recommender systems.
Abstract
Effective recommendation systems rely on capturing user preferences, often requiring incorporating numerous features such as universally unique identifiers (UUIDs) of entities. However, the exceptionally high cardinality of UUIDs poses a significant challenge in terms of model degradation and increased model size due to sparsity. This paper presents two innovative techniques to address the challenge of high cardinality in recommendation systems. Specifically, we propose a bag-of-words approach, combined with layer sharing, to substantially decrease the model size while improving performance. Our techniques were evaluated through offline and online experiments on Uber use cases, resulting in promising results demonstrating our approach's effectiveness in optimizing recommendation systems and enhancing their overall performance.
