Core-elements Subsampling for Alternating Least Squares
Dunyao Xue, Mengyu Li, Cheng Meng, Jingyi Zhang
TL;DR
This work tackles the computational bottleneck of alternating least squares (ALS) in missing-value matrix factorization for recommender systems. It introduces Core-ALS, a core-elements subsampling framework that uses sparse sketches to approximate ALS updates while preserving accuracy, supported by per-iteration $(1+\epsilon)$-approximation guarantees and convergence under mild conditions. Theoretical results quantify variance and bias bounds, and a fast variant with partial quicksort accelerates sampling; empirical evidence across synthetic and real datasets (including Netflix) shows substantial speedups with minimal loss in predictive quality. The method demonstrates strong practical impact for large-scale recommender systems by enabling faster training without sacrificing recommendation performance, and suggests future work on memory efficiency and tensor extensions.
Abstract
In this paper, we propose a novel element-wise subset selection method for the alternating least squares (ALS) algorithm, focusing on low-rank matrix factorization involving matrices with missing values, as commonly encountered in recommender systems. While ALS is widely used for providing personalized recommendations based on user-item interaction data, its high computational cost, stemming from repeated regression operations, poses significant challenges for large-scale datasets. To enhance the efficiency of ALS, we propose a core-elements subsampling method that selects a representative subset of data and leverages sparse matrix operations to approximate ALS estimations efficiently. We establish theoretical guarantees for the approximation and convergence of the proposed approach, showing that it achieves similar accuracy with significantly reduced computational time compared to full-data ALS. Extensive simulations and real-world applications demonstrate the effectiveness of our method in various scenarios, emphasizing its potential in large-scale recommendation systems.
