Towards a Theoretical Understanding of Two-Stage Recommender Systems
Amit Kumar Jaiswal
TL;DR
The paper addresses the theoretical understanding of two-stage, two-tower recommender systems by formalizing user/item covariates as $x_u$ and $\tilde{x}_i$ mapped into a shared $p$-dimensional embedding, with ratings predicted via $R(x_u,\tilde{x}_i)=\langle f(x_u),\tilde{f}(\tilde{x}_i)\rangle$. It develops a framework based on Hölder smoothness $\beta$ and intrinsic dimensions $d_u,d_i$ to bound both approximation and estimation errors, deriving a convergence rate of $O_p(|\Omega|^{-2\beta/(2\beta+d_{ui})}(\log|\Omega|)^2)$ under high smoothness, where $d_{ui}=\max\{d_u,d_i\}$. The authors show that leveraging low intrinsic dimensions accelerates convergence and that finite-depth networks with widths growing as $|\Omega|^{d_{ui}/(2\beta+d_{ui})}$ suffice to approximate the true model. Empirical results on synthetic data and a Yelp dataset corroborate the theory, with T$^2$Rec delivering substantial improvements over baselines, particularly in cold-start regimes due to effective covariate embeddings.
Abstract
Production-grade recommender systems rely heavily on a large-scale corpus used by online media services, including Netflix, Pinterest, and Amazon. These systems enrich recommendations by learning users' and items' embeddings projected in a low-dimensional space with two-stage models (two deep neural networks), which facilitate their embedding constructs to predict users' feedback associated with items. Despite its popularity for recommendations, its theoretical behaviors remain comprehensively unexplored. We study the asymptotic behaviors of the two-stage recommender that entail a strong convergence to the optimal recommender system. We establish certain theoretical properties and statistical assurance of the two-stage recommender. In addition to asymptotic behaviors, we demonstrate that the two-stage recommender system attains faster convergence by relying on the intrinsic dimensions of the input features. Finally, we show numerically that the two-stage recommender enables encapsulating the impacts of items' and users' attributes on ratings, resulting in better performance compared to existing methods conducted using synthetic and real-world data experiments.
