Multiple Testing of Linear Forms for Noisy Matrix Completion
Wanteng Ma, Lilun Du, Dong Xia, Ming Yuan
TL;DR
This work tackles the problem of performing multiple hypothesis tests for linear forms in noisy, low-rank matrix completion. It introduces a novel test statistic built on three steps—gradient-descent initialization, bias correction, and a low-rank incoherence-aware projection—yielding sharp marginal and joint normality for $\langle M, T\rangle$ under $Y = \langle M, X\rangle + \xi$. To control FDR across many tests, the authors develop a data-splitting and symmetric aggregation strategy that leverages weak dependence among statistics, and further enhance performance via whitening and LASSO-based screening to handle stronger dependencies. They provide non-asymptotic FDR/power guarantees that scale with sample size and model parameters, and validate the framework with extensive simulations and real data (MovieLens and Rossmann datasets), demonstrating practical gains in reliable discovery for recommender-system settings. The approach offers a principled path toward uncertainty-aware, scalable inference in high-dimensional, missing-data matrix problems with broad applicability beyond recommender systems.
Abstract
Many important tasks of large-scale recommender systems can be naturally cast as testing multiple linear forms for noisy matrix completion. These problems, however, present unique challenges because of the subtle bias-and-variance tradeoff of and an intricate dependence among the estimated entries induced by the low-rank structure. In this paper, we develop a general approach to overcome these difficulties by introducing new statistics for individual tests with sharp asymptotics both marginally and jointly, and utilizing them to control the false discovery rate (FDR) via a data splitting and symmetric aggregation scheme. We show that valid FDR control can be achieved with guaranteed power under nearly optimal sample size requirements using the proposed methodology. Extensive numerical simulations and real data examples are also presented to further illustrate its practical merits.
