Improving Out-of-Vocabulary Handling in Recommendation Systems

William Shiao; Mingxuan Ju; Zhichun Guo; Xin Chen; Evangelos Papalexakis; Tong Zhao; Neil Shah; Yozen Liu

Improving Out-of-Vocabulary Handling in Recommendation Systems

William Shiao, Mingxuan Ju, Zhichun Guo, Xin Chen, Evangelos Papalexakis, Tong Zhao, Neil Shah, Yozen Liu

TL;DR

This work tackles the inductive out-of-vocabulary problem in recommendation systems by focusing on embedding-table–level solutions that preserve transductive performance. It formalizes OOV definitions, contrasts context-free and context-aware models, and introduces a general OOV embedder framework evaluated across nine embedders (from zero and mean to LSH-based and neural approaches) on four public datasets plus a proprietary production dataset. The experiments show that feature-aware, locality-sensitive hashing methods consistently improve inductive performance, with mean gains around several percentage points over industry baselines, while context-free settings remain challenging and highly sensitive to the embedder choice. The paper provides practical recommendations for practitioners, along with open-source code and enhanced inductive datasets to spur further research and real-world deployment of robust OOV handling in large-scale RS deployments.

Abstract

Recommendation systems (RS) are an increasingly relevant area for both academic and industry researchers, given their widespread impact on the daily online experiences of billions of users. One common issue in real RS is the cold-start problem, where users and items may not contain enough information to produce high-quality recommendations. This work focuses on a complementary problem: recommending new users and items unseen (out-of-vocabulary, or OOV) at training time. This setting is known as the inductive setting and is especially problematic for factorization-based models, which rely on encoding only those users/items seen at training time with fixed parameter vectors. Many existing solutions applied in practice are often naive, such as assigning OOV users/items to random buckets. In this work, we tackle this problem and propose approaches that better leverage available user/item features to improve OOV handling at the embedding table level. We discuss general-purpose plug-and-play approaches that are easily applicable to most RS models and improve inductive performance without negatively impacting transductive model performance. We extensively evaluate 9 OOV embedding methods on 5 models across 4 datasets (spanning different domains). One of these datasets is a proprietary production dataset from a prominent RS employed by a large social platform serving hundreds of millions of daily active users. In our experiments, we find that several proposed methods that exploit feature similarity using LSH consistently outperform alternatives on most model-dataset combinations, with the best method showing a mean improvement of 3.74% over the industry standard baseline in inductive performance. We release our code and hope our work helps practitioners make more informed decisions when handling OOV for their RS and further inspires academic research into improving OOV support in RS.

Improving Out-of-Vocabulary Handling in Recommendation Systems

TL;DR

Abstract

Paper Structure (49 sections, 1 equation, 6 figures, 2 tables, 2 algorithms)

This paper contains 49 sections, 1 equation, 6 figures, 2 tables, 2 algorithms.

Introduction
Preliminaries and Related Work
Notation
OOV Values
Transductive vs. Inductive Settings
Context-Free Models
Context-Aware Models
Towards a General OOV Embedder
OOV Embedders
Heuristic-based Embedders
Zero Embedder
Mean Embedder
Fixed Random Embedder
KNN Embedder
Learning-based Embedders
...and 34 more sections

Figures (6)

Figure 1: Comparison between transductive (left) and inductive (right) settings. In the transductive setting, RS are evaluated on interactions between users and items observed during training time (i.e., bold links). Whereas in the inductive setting, besides transductive interactions, RS are also evaluated on interactions related to users and items unseen during the training (i.e., both bold and dash links).
Figure 2: Comparison of inductive vs transductive performance with Wide & Deep models, where OOV (inductive) values are handled with trained random buckets. We see a clear gap in inductive performance vs transductive performance, showing the importance of properly handling OOV values.
Figure 3: Typical structure of context-aware and context-free recommendation models.
Figure 4: How IV/OOV user IDs are handled under our framework. Item IDs are handled the same way.
Figure 5: Visualization of where the inductive split occurs on the datasets. The $x$-axis is the time that the user/item first appeared. Everything to the left of the split time is used for training and validation. The remainder is used for evaluation.
...and 1 more figures

Improving Out-of-Vocabulary Handling in Recommendation Systems

TL;DR

Abstract

Improving Out-of-Vocabulary Handling in Recommendation Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (6)