Parallel and Mini-Batch Stable Matching for Large-Scale Reciprocal Recommender Systems
Kento Nakada, Kazuki Kawamura, Ryosuke Furukawa
TL;DR
This work recasts stable matching for two-sided reciprocal recommender systems as an entropy-regularized optimal transport problem under transferable utility, enabling both high match quality and computational efficiency. It introduces parallel batch IPFP and online mini-batch IPFP to scale to up to a million users, leveraging a matrix–vector formulation with $A=\exp(\boldsymbol{\Phi}/(2\beta))$ and a factorized preference model for memory efficiency. The experiments on real and synthetic data demonstrate superior expected matches and solid scalability on CPU/GPU hardware, with memory usage that scales linearly and practical feasibility for large-scale platforms. The proposed approach offers a principled, scalable framework for maximizing total matches in large two-sided markets while preserving stability and fairness considerations inherent in TU-based matching.
Abstract
Reciprocal recommender systems (RRSs) are crucial in online two-sided matching platforms, such as online job or dating markets, as they need to consider the preferences of both sides of the match. The concentration of recommendations to a subset of users on these platforms undermines their match opportunities and reduces the total number of matches. To maximize the total number of expected matches among market participants, stable matching theory with transferable utility has been applied to RRSs. However, computational complexity and memory efficiency quadratically increase with the number of users, making it difficult to implement stable matching algorithms for several users. In this study, we propose novel methods using parallel and mini-batch computations for reciprocal recommendation models to improve the computational time and space efficiency of the optimization process for stable matching. Experiments on both real and synthetic data confirmed that our stable matching theory-based RRS increased the computation speed and enabled tractable large-scale data processing of up to one million samples with a single graphics processing unit graphics board, without losing the match count.
